Document Understanding Conferences
Introduction
Publications
Data
Guidelines
|
|
Sample TREC 2002 Novelty Track Materials
This page contains the sample data for the TREC 2002 novelty track.
Lists of both relevant sentences and those relevant sentences
that contain new information were produced by NIST staff members
for four topics. For two topics, two different staff members produced
the sentence lists.
The following data is provided for each topic:
- The text of the topic statement. The topic statement includes
the original TREC topic text, a new "description" field
(which might simply be a repeat of the original description field),
and a list of the relevant documents ids in the order the
documents should be processed.
- A gzipped file containing the text of the relevant
documents in the order given in the topic statement.
The document text contains all the tags that are contained
in the documents on the TREC CDs, plus
additional markup that breaks the text into sentences.
The documents are simply concatenated one after the other in the file.
- One or two relevant sentences lists. A relevant list is the
set of sentences the assessor considered to be relevant
to the topic (more specifically, to the topic as described in
the new description field) without regard to duplication.
A common letter indicates the same assessor produced that set
(i.e., all "A" sets were produced by the same person).
- One or two new sentences list. A new sentence list is a subset
of the corresponding relevant sentence list where sentences
containing only information that is already known have
been eliminated.
Topic 303
|