DUC 2003 Documents for Summarization, Tasks, and Measures


	Document Understanding Conferences Introduction Publications Data Guidelines		D U C 2 0 0 3: Documents, Tasks, and Measures DUC 2003 will use documents from the TDT and TREC collections and will incorporate focus of various sorts to reduce variability and better model real tasks. It will examine automatic creation of short and very short summaries. What follows is a brief description of the data and tasks - a more detailed version of what was developed at the DUC 2002 Workshop. Documents for summarization 30 TREC document clusters Documents/clusters: NIST assessors will choose 30 clusters of TREC documents related to subjects of interest to them. Each subset will contain on average 10 documents. The documents will come from the following collections with their own taggings: AP newswire, 1998-2000 New York Times newswire, 1998-2000 Xinhua News Agency (English version), 1996-2000 Here is a DTD. Manual summaries: NIST assessors will create a very short summary (~10 words, no specific format other than linear) of each document. They will also create a focused short summary (~100 words) of each cluster, designed to reflect a viewpoint defined by the assessor. 30 TDT document clusters Documents/clusters: NIST staff will choose 30 TDT topics/events/timespans and a subset of the documents TDT annotators found for each topic/event/timespan. Each subset will contain on average 10 documents. The documents will come from the same collections identified above. Manual summaries: NIST assessors will be given the TDT topic and will create a very short summary (~10 words, no specific format other than linear) of each document and a short summary of each cluster. These summaries will not be focused in any particular way beyond by the documents and the topic. 30 TREC Novelty track document clusters Documents/clusters: NIST staff will choose 30 TREC Novelty Track question topics and a subset of the documents TREC assessors found relevant to each topic. Each subset will contain on average 22 documents. The documents will come from the following collections with their own taggings: Financial Times of London, 1991-1994 Federal Register, 1994 FBIS, 1996 Los Angeles Times 1989-1990 Here are the DTDs. Manual summaries: NIST assessors will create a focused short summary (~100 words) of each cluster, designed to answer the question posed by the TREC topic. Tasks and measures In what follows, the evaluation of quality and coverage implements the SEE manual evaluation protocol. Where sentences needed to be identified, a simple Perl script for sentence separation was used. . Task 1 - Very short summaries Use the 30 TDT clusters and the 30 TREC clusters. Given each document, create a very short summary (~10 words, no specific format other than linear) of it. NIST will evaluate a subset of the summaries intrinsically (SEE) for coverage (similar to DUC 2002). In addition, NIST will assign each of the evaluated summaries to one of a set of given categories based on anticipated "usefulness" (See the Issues section below). Task 2 - Short summaries focused by events Use the 30 TDT clusters. Given each document cluster and the associated TDT topic, create a short summary (~100 words) of the cluster. NIST will evaluate the summaries intrinsically (SEE) for quality and length-adjusted coverage. Task 3 - Short summaries focused by viewpoints Use the 30 TREC clusters. Given each document cluster and a viewpoint description, create a short summary (~100 words) of the cluster from the point of view specified. The viewpoint description will be a natural language string no larger than a sentence. It will describe the important facet(s) of the cluster the assessor has decided to include in the short summary. These facet(s) will be represented in at least all but one of the documents in the cluster. NIST will evaluate the summaries intrinsically (SEE) for quality and length-adjusted coverage. Task 4 - Short summaries in response to a question Use the 30 TREC Novelty track clusters. Given each document cluster, a question, and the set of sentences in each document deemed relevant to the question, create a short summary (~100 words) of the cluster that answers the question. The set of sentences in each document that were deemed relevant and novel will also be made available. The sentences were identified automatically by the simple sentence separation program used in DUC 2002. The instructions given to the humans that identified the relevant and novel sentences included the following: Order the printed documents according to the ranked list in the topic. Using the description part of the topic only, go thru each printed document and mark in yellow all sentences that directly provide information requested by the description. Do not mark sentences that are introductory or explanatory in nature. In particular, if there is a set of sentences that provide a single piece of information, only select the sentence that provides the most detail in that set. If two adjacent sentences are needed to provide a single piece of information because of an unusual sentence construction or error in the sentence segmentor, mark both. Go to the computer and pull up the online version of your documents. Go through each document, selecting the sentences that you have previously marked (you can change your mind). Save this edited version as "relevant". Now go thru the online version looking for duplicate information. Order is important here; if a piece of information has already been picked, then repeats of that same information should be deleted. Instances that give further details of that information should be retained, but instances that summarize details seen earlier should be deleted. Save this second edited version as "new". Here is what the humans creating the summaries will be asked to do: In this round NIST will mail you 12 sets of printed documents. NIST will also email you 12 files - one for each set of documents. Each file will contain a topic statement which poses a question and a list of sentences which have been determined to be relevant to the question posed by the topic. In addition, some sentences will be marked as "novel". This means that someone reading the list from top to bottom decided that the "novel" sentences introduced new information. Your task is to create a summary of about 100-words for each file of relevant sentences. The sentences marked as "novel" may be useful in creating your summary. The printed documents are ONLY there for reference - in case you have trouble understanding any of the sentences. For example you might need to refer to the printed document to figure out who a pronoun in the sentence file refers to. Please do not incorporate facts that only occur in the printed document into your summary. Your summary should be of the sentences in the sentence file. Sample novelty topic, documents, and relevant/new sentence lists are available. The full set of data from the TREC 2002 Novelty Track is available here: topics document text (available as DUC 2003 Past Data) relevant sentence IDs new and relevant sentences IDs NIST will evaluate the summaries intrinsically (SEE) for quality and length-adjusted coverage. In addition, NIST will assign each of the evaluated summaries for a cluster to one of a set of given categories based on "responsiveness" (See the Issues section below) to the question. Issues Operational definition of usefulness categories in task 1: For each document within that set for which summaries are being judged, the assessor will be presented with the document and all the submitted very short summaries of that document. The instructions to the assessor will include the following: Imagine that to save time, rather than read through a set of complete documents in search of one of interest to you, you could first read a list of very short summaries of those documents and based on those summaries choose which documents to read in their entirety. It would of course be possible to create various very short summaries of a given document and some such summaries might be more helpful than others (e.g., tell you more about the content relevant to the subject, be easier to read, etc.) Your task is to help us understand how relatively helpful a number of very short summaries of the same document are. Please read all the following very short summaries of the document you have been given. Assume the document is one you should read. Grade each summary according to how useful you think it would be in getting you to choose the document: 0 (worst, of no use), 1, 2, 3, or 4 (best, as good as having the full document). Operational definition of responsiveness categories in task 4: The assessor will be presented with a question (topic), all the submitted short summaries being judged for that question, and the relevant sentences from the set of documents being summarized. The instructions to the assessor will inlcude the following: You have been given a question (topic), the relevant sentences from a document set, and a number of short summaries of those sentences - designed to answer the question. Some of the summaries may be more responsive (in form and content) to the question than others. Your task is to help us understand how relatively well each summary responds to the question. Read the question and all the associated short summaries. Consult the relevant sentences in the document set as needed. Then grade each summary according to how responsive it is to the question: 0 (worst, unresponsive), 1, 2, 3, or 4 (best, fully responsive). Revised quality questions: We will reuse the 12 quality questions from DUC 2002. Definitions of baseline for the very short summaries: We will use the HEADLINE element from each document as the baseline very short summary of that document. Such elements exist for over 80% of the documents in the collections used. Where such an element does, not exist, we will suply one. NOTE: The documents will be distributed without the HEADLINE elements as a convenience to participants. Baselines for all tasks have now been defined.

For data, past results, mailing list or other general information
contact: Lori Buckland ([email protected])
For other questions contact: Paul Over ([email protected])
Last updated:
Date created: Friday, 26-July-02