Operational definitions of the DUC 2003 baseline summaries Note that unlike the procedure for DUC 2001, we will not truncate any sentences. We allowed baselines to go as much as 15 words over the target size. If adding another sentence's worth of words would make the baseline summary exceed the target by more than 15 words, we will not include that sentence and the summary was then shorter than the targeted size. For single-document summarization (very short summaries): Baseline 1 (lead baseline) Take the "HEADLINE" element or its equivalent. For multi-document summarization (short summaries): Task 2,3 ----------------------------------------------------- Baseline 2 (lead baseline) Take the first 100 words* in the last document in the document set, where documents are assumed to be ordered chronologically. * whitespace-delimited non-tag tokens found in the TEXT, LEADPARA, LP, etc. portions of the document. Baseline 3 (coverage baseline) Take the first sentence in the first doc, the first sentence in the second document, the first sentence in the third document, ... until you have 100 words. Task 4 --------------------------------------------------------- Baseline 4 (lead baseline) Take the first 100 words* from the first n relevant sentences in the first document in the document set, where documents are assumed to be ordered by relevance ranking given with the topic. * whitespace-delimited non-tag tokens found in the TEXT, LEADPARA, LP, etc. portions of the document. Baseline 5 (coverage baseline) Take the first relevant sentence in the first document, the first relevant sentence in the second document, the first relevant sentence in the third document, ... until you have 100 words. Documents are assumed to be ordered by relevance ranking given with the topic.