Document Understanding Conferences
Introduction
Publications
Data
Guidelines
|
|
Procedure for human comparison of model (reference) and peer
(system-generated and other) abstracts
-
For each document set (in randomized order):
-
For each summary type (single-document or multi-document summary):
-
For each peer summary (in randomized order) - composed of peer units (PUs), which will be sentences:
-
If the peer target size is greater than 10, the evaluator reads the
peer summary and then makes overall judgments as to the peer summary's
quality, independent of the model. Questions 1-5 are within-sentence
judgments; the rest are within- or across-sentence judgments. The
answers are chosen in every case from the following set of 4 ordered
categories: {0, 1-5, 6-10, more than 10}
- About how many gross capitalization errors are there?
Examples:
- [new sentence] the new drugs proved beneficial.
- PEOPLE WERE AMAZED WHEN THEY LEARNED THE DETAILS.
- About how many sentences have incorrect word order?
Examples:
- John before Mary the park visited
- About how many times does the subject fail to agree in number with
the verb?
Examples:
- The student see a teacher with a telescope
- It was clear, that the reporters agrees with the idea.
- About how many of the sentences are missing important components
(e.g. the subject, main verb, direct object, modifier) - causing
the sentence to be ungrammatical, unclear, or misleading?
Examples:
- The agreement, signed in London yesterday, .
- Mr. Smith to Washington where he met the senator.
- The exchange rate is %.
- Stewart, the builder.
- About many times are unrelated fragments joined into one sentence?
Examples:
- They run a refinery; two apples would be enough.
- About how many times are articles (a, an, the) missing or used
incorrectly?
Examples:
- Men saw woman with the telescope
- He picked up a book. A book looked interesting.
- El Paso owns and operates refinery.
- About how many pronouns are there whose antecedents are incorrect,
unclear, missing, or come only later?
Examples:
- [opening sentence] Their agreement was signed in Oslo in 1933.
- Many Presidents were targets of assassins. Pres. Reagan was
wounded. He was shot in the Ford's Theater in 1865.
- For about how many nouns is it impossible to determine clearly who
or what they refer to?
Examples:
- The company agreed to negotiate. [Which company?]
- About how times should a noun or noun phrase have been replaced
with a pronoun?
Examples:
- Mr. John Smith went to DC. Mr. John Smith saw the senator.
- About how many dangling conjunctions are there ("and", "however"...)?
Examples:
- [opening sentence] However, they came to a good agreement.
- About many instances of unnecessarily repeated information are there?
Examples:
- Yesterday's estimate does not include any projection for claims in
Louisiana, which was also affected by the storm, although less
severely than Florida. But on the Florida losses alone, Hurricane
Andrew becomes the most costly insured catastrophe in the US.
Louisiana was also affected by the storm. With Florida's Hurricane
Andrew losses added in, the total rises to Dollars 11.2bn. This
does not include claims in Louisiana.
- About how many sentences strike you as being in the wrong place
because they indicate a strange time sequence, suggest a wrong
cause-effect relationship, or just don't fit in topically with
neighboring sentences?
-
View the model summary - composed of model units (MUs), which are
human-corrected chunks of a type to be determined
-
Evaluator steps through the MUs. For each MU s/he:
-
marks any/all PU(s) sharing content with the current MU
-
indicates whether the marked PUs, taken together, express about 0%,
20%, 40%, 60%, 80%, or 100% of the content in the current MU.
-
Evaluator reviews unmarked PUs and indicates once for the entire peer summary
that:
-
About 0%, 20%, 40%, 60%, 80%, or 100% of the unmarked PUs are related
but needn't be included in the model summary
-
(Evaluators will be allowed to review and
revise earlier peer summary judgments before moving to the next document
set - to mitigate learning effects.)
|