Document Understanding Conferences
Introduction
Publications
Data
Guidelines
|
|
Procedure for human comparison of model (reference) and peer
(system-generated and other) abstracts using SEE
-
For each document set (in randomized order):
-
For each summary type (single-document or multi-document summary):
-
For each peer summary (in randomized order) - composed of peer units (PUs), which will be sentences:
-
If the peer target size is greater than 10, the evaluator reads the
peer summary and then makes overall judgments as to the peer summary's
quality, independent of the model. The
answers are chosen in every case from the following set of 5 ordered
categories.
Here is the text
of the questions.
-
View the model summary - composed of model units (MUs), which are
human-corrected chunks of a type to be determined
-
Evaluator steps through the MUs. For each MU s/he:
-
marks any/all PU(s) sharing content with the current MU
-
indicates whether the marked PUs, taken together, express about 0%,
20%, 40%, 60%, 80%, or 100% of the content in the current MU.
-
Evaluator reviews unmarked PUs and indicates once for the entire peer summary
that:
-
About 0%, 20%, 40%, 60%, 80%, or 100% of the unmarked PUs are related
but needn't be included in the model summary
-
(Evaluators will be allowed to review and
revise earlier peer summary judgments before moving to the next document
set - to mitigate learning effects.)
|