Return to DUC Homepage
Document
Understanding
Conferences


Introduction
Publications
Data
Guidelines

Sample TREC 2002 Novelty Track Materials

This page contains the sample data for the TREC 2002 novelty track. Lists of both relevant sentences and those relevant sentences that contain new information were produced by NIST staff members for four topics. For two topics, two different staff members produced the sentence lists.

The following data is provided for each topic:
  • The text of the topic statement. The topic statement includes the original TREC topic text, a new "description" field (which might simply be a repeat of the original description field), and a list of the relevant documents ids in the order the documents should be processed.
  • A gzipped file containing the text of the relevant documents in the order given in the topic statement. The document text contains all the tags that are contained in the documents on the TREC CDs, plus additional markup that breaks the text into sentences. The documents are simply concatenated one after the other in the file.
  • One or two relevant sentences lists. A relevant list is the set of sentences the assessor considered to be relevant to the topic (more specifically, to the topic as described in the new description field) without regard to duplication. A common letter indicates the same assessor produced that set (i.e., all "A" sets were produced by the same person).
  • One or two new sentences list. A new sentence list is a subset of the corresponding relevant sentence list where sentences containing only information that is already known have been eliminated.

Topic 303

For data, past results, mailing list or other general information
contact: Lori Buckland (lori.buckland@nist.gov)

For other questions contact: Paul Over (over@nist.gov)
Last updated: Friday, 18-Oct-2002 07:35:34 MDT
Date created: Thursday, 17-October-02