Cross-Document Event-Keyed Summarization
- URL: http://arxiv.org/abs/2410.14795v1
- Date: Fri, 18 Oct 2024 18:09:45 GMT
- Title: Cross-Document Event-Keyed Summarization
- Authors: William Walden, Pavlo Kuchmiichuk, Alexander Martin, Chihsheng Jin, Angela Cao, Claire Sun, Curisia Allen, Aaron Steven White,
- Abstract summary: We extend event-keyed summarization (EKS) to the cross-document setting (CDEKS)
We introduce SEAMUS, a high-quality dataset for CDEKS based on an expert reannotation of the FAMUS dataset for cross-document argument extraction.
We present a suite of baselines on SEAMUS, covering both smaller, fine-tuned models, as well as zero- and few-shot prompted LLMs, along with detailed ablations, and a human evaluation study.
- Score: 35.957271217461525
- License:
- Abstract: Event-keyed summarization (EKS) requires generating a summary about a specific event described in a document, given the document and an event representation extracted from it. In this work, we extend EKS to the cross-document setting (CDEKS), in which summaries must synthesize information from accounts of the same event given by multiple sources. We introduce SEAMUS (Summaries of Events Across Multiple Sources), a high-quality dataset for CDEKS based on an expert reannotation of the FAMUS dataset for cross-document argument extraction. We present a suite of baselines on SEAMUS, covering both smaller, fine-tuned models, as well as zero- and few-shot prompted LLMs, along with detailed ablations, and a human evaluation study, showing SEAMUS to be a valuable benchmark for this new task.
Related papers
- Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm [33.737981167605575]
This paper proposes the task of cross-document event extraction (CDEE) to integrate event information from multiple documents and provide a comprehensive perspective on events.
We construct a novel cross-document event extraction dataset, namely CLES, which contains 20,059 documents and 37,688 mention-level events.
Our CDEE pipeline achieves about 72% F1 in end-to-end cross-document event extraction, suggesting the challenge of this task.
arXiv Detail & Related papers (2024-06-23T06:01:11Z) - Write Summary Step-by-Step: A Pilot Study of Stepwise Summarization [48.57273563299046]
We propose the task of Stepwise Summarization, which aims to generate a new appended summary each time a new document is proposed.
The appended summary should not only summarize the newly added content but also be coherent with the previous summary.
We show that SSG achieves state-of-the-art performance in terms of both automatic metrics and human evaluations.
arXiv Detail & Related papers (2024-06-08T05:37:26Z) - Event-Keyed Summarization [46.521305453350635]
Event-keyed summarization (EKS) is a novel task that marries traditional summarization and document-level event extraction.
We introduce a dataset for this task, MUCSUM, consisting of summaries of all events in the classic MUC-4 dataset.
We show that ablations that reduce EKS to traditional summarization or structure-to-text yield inferior summaries of target events.
arXiv Detail & Related papers (2024-02-10T15:32:53Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - UniSumm and SummZoo: Unified Model and Diverse Benchmark for Few-Shot
Summarization [54.59104881168188]
textscUniSumm is a unified few-shot summarization model pre-trained with multiple summarization tasks.
textscSummZoo is a new benchmark to better evaluate few-shot summarizers.
arXiv Detail & Related papers (2022-11-17T18:54:47Z) - Unsupervised Summarization with Customized Granularities [76.26899748972423]
We propose the first unsupervised multi-granularity summarization framework, GranuSum.
By inputting different numbers of events, GranuSum is capable of producing multi-granular summaries in an unsupervised manner.
arXiv Detail & Related papers (2022-01-29T05:56:35Z) - HowSumm: A Multi-Document Summarization Dataset Derived from WikiHow
Articles [8.53502615629675]
We present HowSumm, a novel large-scale dataset for the task of query-focused multi-document summarization (qMDS)
This use-case is different from the use-cases covered in existing multi-document summarization (MDS) datasets and is applicable to educational and industrial scenarios.
We describe the creation of the dataset and discuss the unique features that distinguish it from other summarization corpora.
arXiv Detail & Related papers (2021-10-07T04:44:32Z) - WSL-DS: Weakly Supervised Learning with Distant Supervision for Query
Focused Multi-Document Abstractive Summarization [16.048329028104643]
In the Query Focused Multi-Document Summarization (QF-MDS) task, a set of documents and a query are given where the goal is to generate a summary from these documents.
One major challenge for this task is the lack of availability of labeled training datasets.
We propose a novel weakly supervised learning approach via utilizing distant supervision.
arXiv Detail & Related papers (2020-11-03T02:02:55Z) - A Large-Scale Multi-Document Summarization Dataset from the Wikipedia
Current Events Portal [10.553314461761968]
Multi-document summarization (MDS) aims to compress the content in large document collections into short summaries.
This work presents a new dataset for MDS that is large both in the total number of document clusters and in the size of individual clusters.
arXiv Detail & Related papers (2020-05-20T14:33:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.