DUC 2007: Call for Participation

Document Understanding Conference (DUC)
Rochester, NY
April 26-27, 2007

Conducted by:
National Institute of Standards and Technology (NIST)

As the amount of online text continues to grow, we are witnessing a tremendous increase in interest in summarization research from both academia and industry. The Document Understanding Conference (DUC) is a series of summarization evaluations that have been conducted by the National Institute of Standards and Technology (NIST) since 2001. Its goal is to further progress in automatic text summarization and enable researchers to participate in large-scale experiments in both the development and evaluation of summarization systems.

DUC has evaluated summarization systems for generic and focused summaries of English newspaper and newswire data. Various target sizes (10 - 400 words) have been used and both single-document summaries and summaries of multiple documents have been evaluated. Summaries have been manually judged for their readability, and both manual and automatic evaluation of content coverage have been explored. In 2004, the road mapping committee strongly recommended that new tasks be undertaken that are strongly tied to a clear user application. Therefore, DUC 2005-2006 had a single system task requiring informative summaries to be generated in response to a complex question (e.g., "What are the advantages and disadvantages of same-sex schools"). DUC has continued to grow since its inception, with 34 sites world-wide participating in the DUC 2006 question-focused summarization task.

The main system task for DUC 2007 will be the same as the 2006 task and will model real-world complex question answering, in which an information need cannot be satisfied by simply stating a name, date, quantity, etc. Given a topic (question) and a set of 25 relevant documents, the task is to synthesize a fluent, well-organized 250-word summary of the documents that answers the question(s) in the topic statement. Summaries will be manually judged for both fluency and responsiveness to the topic statement. A second, pilot task will also be run, with the goal of producing short (~100 word) update summaries of newswire articles under the assumption that the user has already read a set of earlier articles. Information about DUC 2007 will be updated and made available in the DUC 2007 guidelines, and the evaluation results will be presented and discussed at the DUC 2007 Workshop to be held in conjunction with the HLT-NAACL 2007 Conference.

You are invited to participate in the DUC 2007 system tasks. Organizations interested in participating should submit an application as soon as possible, but no later than November 19, 2006. Submitting an application does not commit you to participating in the DUC 2007 tasks. However, once you apply you will be subscribed to the duc2007 email list, which will be the means of discussing and communicating about DUC 2007. You are encouraged to bring up questions, concerns, and suggestions using this forum. Late applications may be accepted if resources allow, but in no case will sample or test data be released to groups who have not applied. Please email your application to The application should include the following information:

  1. Contact information (organization name, full mailing address, voice and fax phone numbers, email of a main DUC contact)
  2. Names and email addresses of group members to be included in the duc2007 mailing list
  3. A short paragraph on the organization's summarization approach
  4. Which task(s) you would like to participate in (main and/or update summary task)
  5. An indication of whether this group has participated in DUC or TREC before

All summarization results submitted to DUC will be published in the Proceedings and archived on the DUC web site. Dissemination of DUC work and results other than in the conference proceedings is welcomed, but the conditions of participation preclude specific advertising claims based on DUC results.

Important Dates:

    November 19, 2006    Application for participation due
    January 17, 2007 Test data available from NIST
    January 31, 2007 Submissions due at NIST for evaluation
    February 28, 2007 Evaluated results returned to participants
    March 21, 2007 Presentation proposals due
    April 11, 2007 Workshop papers due at NIST
    April 26-27, 2007 DUC 2007 workshop at HLT-NAACL

Program Committee:

    John Conroy, IDA/CCS
    Hoa Trang Dang, NIST (co-chair)
    Donna Harman, NIST (co-chair)
    Ed Hovy, ISI/USC
    Kathy McKeown, Columbia University
    Drago Radev, University of Michigan
    Karen Sparck-Jones, University of Cambridge
    Lucy Vanderwende, Microsoft Research

