DUC 2007 Task, Documents for Summarization, and Measures  
Return to DUC Homepage
Document
Understanding
Conferences

Introduction
Publications
Data
Guidelines
 

D U C   2 0 0 7:   Task, Documents, and Measures


The Document Understanding Conference (DUC) is a series of summarization evaluations that have been conducted by the National Institute of Standards and Technology (NIST) since 2001. Its goal is to further progress in automatic text summarization and enable researchers to participate in large-scale experiments in both the development and evaluation of summarization systems.

DUC 2007 will consist of two tasks. The tasks are independent, and participants in DUC 2007 may choose to do one or both tasks:

  1. Main task
  2. Update task (pilot)
  3. The main task is the same as the DUC 2006 task and will model real-world complex question answering, in which a question cannot be answered by simply stating a name, date, quantity, etc. Given a topic and a set of 25 relevant documents, the task is to synthesize a fluent, well-organized 250-word summary of the documents that answers the question(s) in the topic statement. Successful performance on the task will benefit from a combination of IR and NLP capabilities, including passage retrieval, compression, and generation of fluent text.

    The update task will be to produce short (~100 words) multi-document update summaries of newswire articles under the assumption that the user has already read a set of earlier articles. The purpose of each update summary will be to inform the reader of new information about a particular topic.


    Documents for summarization

    The documents for summarization will come from the AQUAINT corpus, comprising newswire articles from the Associated Press and New York Times (1998-2000) and Xinhua News Agency (1996-2000). The corpus has the following DTD:

    • AQUAINT corpus (DTD)

    NIST assessors will develop topics of interest to them. The assessor will create a topic and choose a set of 25 documents relevant to the topic. These documents will form the document cluster for that topic. Topics and document clusters will be distributed by NIST. Only DUC 2007 participants who have completed all required forms will be allowed access.


    Main Task

    Reference summaries

    Each topic and its document cluster will be given to 4 different NIST assessors, including the developer of the topic. The assessor will create a ~250-word summary of the document cluster that satisfies the information need expressed in the topic statement. These multiple references summaries will be used in the evaluation of summary content.

    System task

    System task: Given a DUC topic and a set of 25 relevant documents, create from the documents a brief, well-organized, fluent summary which answers the need for information expressed in the topic statement. All processing of documents and generation of summaries must be automatic.

    The summary can be no longer than 250 words (whitespace-delimited tokens). Summaries over the size limit will be truncated. No bonus will be given for creating a shorter summary. No specific formatting other than linear is allowed.

    There will be 45 topics in the test data. Each group can submit one set of results, i.e., one summary for each topic/cluster. Participating groups should be able to evaluate additional results themselves using ISI's ROUGE/BE package.

    Evaluation

    All summaries will first be truncated to 250 words. Where sentences need to be identified for automatic evaluation, NIST will then use a simple Perl script for sentence segmentation.

    • NIST will manually evaluate the linguistic well-formedness of each submitted summary using a set of quality questions.

    • NIST will manually evaluate the relative responsiveness of each submitted summary to the topic. Here are instructions to the assessors for judging responsiveness.

    • NIST will run ROUGE-1.5.5 to compute ROUGE-2 and ROUGE-SU4, with stemming and keeping stopwords. Jackknifing will be implemented so that human and system scores can be compared. ROUGE-1.5.5 will be run with the following parameters:

      ROUGE-1.5.5.pl -n 2 -x -m -2 4 -u -c 95 -r 1000 -f A -p 0.5 -t 0 -d

        -n 2 compute ROUGE-1 and ROUGE-2
        -x do not calculate ROUGE-L
        -m apply Porter stemmer on both models and peers
        -2 4 compute Skip Bigram (ROUGE-S) with a maximum skip distance of 4
        -u include unigram in Skip Bigram (ROUGE-S)
        -c 95 use 95% confidence interval
        -r 1000 bootstrap resample 1000 times to estimate the 95% confidence interval
        -f A scores are averaged over multiple models
        -p 0.5 compute F-measure with alpha = 0.5
        -t 0 use model unit as the counting unit
        -d print per-evaluation scores

    • NIST will calculate overlap in Basic Elements (BE) between automatic and manual summaries. Summaries will be parsed with Minipar, and BE-F will be extracted. These BEs will be matched using the Head-Modifier criterion.

        ROUGE-1.5.5.pl -3 HM -d
    • Groups may participate in an optional manual evaluation of summary content using the pyramid method, which will be carried out cooperatively by DUC participants.


    Update Task (Pilot)

    The update summary pilot task will be to create short (100-word) multi-document summaries under the assumption that the reader has already read a number of previous documents. The topics and documents for the update pilot will be a subset of those for the main DUC task. There will be approximately 10 topics in the test data, with 25 documents per topic. For each topic, the documents will be ordered chronologically and then partitioned into 3 sets, A-C, where the time stamps on all the documents in each set are ordered such that time(A) < time(B) < time(C). There will be approximately 10 documents in Set A, 8 in Set B, and 7 in Set C.

    Reference Summaries

    Instructions given to NIST assessors for writing update summaries.

    Each topic and its 3 document clusters, A-C, will be given to 4 different NIST assessors. The assessor will create 3 100-word topic-focused summaries that contribute to satisfying the information need expressed in the topic statement:

    1. A summary of documents in cluster A
    2. An update summary of documents in B, under the assumption that the reader has already read documents in A
    3. An update summary of documents in C, under the assumption that the reader has already read documents in A and B
    These multiple references summaries will be used in the evaluation of summary content.

    System Task

    System task: Given a DUC topic and its 3 document clusters, A-C, create from the documents three brief, fluent summaries that contribute to satisfying the information need expressed in the topic statement:
    1. A summary of documents in cluster A
    2. An update summary of documents in B, under the assumption that the reader has already read documents in A
    3. An update summary of documents in C, under the assumption that the reader has already read documents in A and B
    Each summary can be no longer than 100 words (whitespace-delimited tokens). Summaries over the size limit will be truncated. No specific formatting other than linear is allowed. Each group can submit one set of results, i.e., one summary for each of the document clusters for each topic. Within a topic, the document clusters must be processed in chronological order; i.e., you cannot look at documents in cluster B or C when generating the summary for cluster A, and you cannot look at the documents in cluster C when generating the summary for cluster B. However, the documents within a cluster can be processed in any order.

    Evaluation

    All summaries will first be truncated to 100 words. Where sentences need to be identified for automatic evaluation, NIST will then use a simple Perl script for sentence segmentation.

    • NIST will run ROUGE-1.5.5 to compute ROUGE-2 and ROUGE-SU4, with stemming and keeping stopwords. Jackknifing will be implemented so that human and system scores can be compared.

    • NIST will calculate overlap in Basic Elements (BE) between automatic and manual summaries. Summaries will be parsed with Minipar, and BE-F will be extracted. These BEs will be matched using the Head-Modifier criterion.

    • NIST will conduct a manual evaluation of summary content using a pyramid-like method based on information nuggets.


    Tools for DUC 2007


    DUC Workshop Papers and Presentations

    Each participant in the system task should submit a paper describing their system architecture, results, and analysis; these papers will be published in the DUC 2007 Workshop Proceedings. Participants who would like to give an oral presentation of their paper at the workshop should submit a presentation proposal by March 21, 2007, and the Program Committee will select the groups who will present at the workshop.

For data, past results, mailing list or other general information
contact: Lori Buckland (lori.buckland@nist.gov)
For other questions contact: Hoa Dang (hoa.dang AT nist.gov)
Last updated: Thursday, 24-Mar-2011 11:40:47 MDT
Date created: Wednesday, 18-October-06