Return to DUC Homepage


Procedure for human comparison of model (reference) and peer (system-generated and other) abstracts using SEE

  • For each document set (in randomized order):
    • For each summary type (single-document or multi-document summary):
      • For each peer summary (in randomized order) - composed of peer units (PUs), which will be sentences:
        • If the peer target size is greater than 10, the evaluator reads the peer summary and then makes overall judgments as to the peer summary's quality, independent of the model. The answers are chosen in every case from the following set of 5 ordered categories.

        • Here is the text of the questions.
        • View the model summary - composed of model units (MUs), which are  human-corrected chunks of a type to be determined
        • Evaluator steps through the MUs.  For each MU s/he:
          • marks any/all PU(s) sharing content with the current MU
          • indicates whether the marked PUs, taken together, express about 0%, 20%, 40%, 60%, 80%, or 100% of the content in the current MU.
        • Evaluator reviews unmarked PUs and indicates once for the entire peer summary that:
          • About 0%, 20%, 40%, 60%, 80%, or 100% of the unmarked PUs are related but needn't be included in the model summary
      • (Evaluators will be allowed to review and revise earlier peer summary judgments before moving to the next document set - to mitigate learning effects.)

For data, past results, mailing list or other general information
contact: Lori Buckland (
For other questions contact: Paul Over (
Last updated: Thursday, 25-Mar-2004 09:46:52 MST
Date created: Thursday, 25-March-04