Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm
- URL: http://arxiv.org/abs/2406.16021v1
- Date: Sun, 23 Jun 2024 06:01:11 GMT
- Title: Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm
- Authors: Qiang Gao, Zixiang Meng, Bobo Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji,
- Abstract summary: This paper proposes the task of cross-document event extraction (CDEE) to integrate event information from multiple documents and provide a comprehensive perspective on events.
We construct a novel cross-document event extraction dataset, namely CLES, which contains 20,059 documents and 37,688 mention-level events.
Our CDEE pipeline achieves about 72% F1 in end-to-end cross-document event extraction, suggesting the challenge of this task.
- Score: 33.737981167605575
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Document-level event extraction aims to extract structured event information from unstructured text. However, a single document often contains limited event information and the roles of different event arguments may be biased due to the influence of the information source. This paper addresses the limitations of traditional document-level event extraction by proposing the task of cross-document event extraction (CDEE) to integrate event information from multiple documents and provide a comprehensive perspective on events. We construct a novel cross-document event extraction dataset, namely CLES, which contains 20,059 documents and 37,688 mention-level events, where over 70% of them are cross-document. To build a benchmark, we propose a CDEE pipeline that includes 5 steps, namely event extraction, coreference resolution, entity normalization, role normalization and entity-role resolution. Our CDEE pipeline achieves about 72% F1 in end-to-end cross-document event extraction, suggesting the challenge of this task. Our work builds a new line of information extraction research and will attract new research attention.
Related papers
- Cross-Document Event-Keyed Summarization [35.957271217461525]
We extend event-keyed summarization (EKS) to the cross-document setting (CDEKS)
We introduce SEAMUS, a high-quality dataset for CDEKS based on an expert reannotation of the FAMUS dataset for cross-document argument extraction.
We present a suite of baselines on SEAMUS, covering both smaller, fine-tuned models, as well as zero- and few-shot prompted LLMs, along with detailed ablations, and a human evaluation study.
arXiv Detail & Related papers (2024-10-18T18:09:45Z) - Grounding Partially-Defined Events in Multimodal Data [61.0063273919745]
We introduce a multimodal formulation for partially-defined events and cast the extraction of these events as a three-stage span retrieval task.
We propose a benchmark for this task, MultiVENT-G, that consists of 14.5 hours of densely annotated current event videos and 1,168 text documents, containing 22.8K labeled event-centric entities.
Results illustrate the challenges that abstract event understanding poses and demonstrates promise in event-centric video-language systems.
arXiv Detail & Related papers (2024-10-07T17:59:48Z) - Event GDR: Event-Centric Generative Document Retrieval [37.53593254200252]
We propose Event GDR, an event-centric generative document retrieval model.
We employ events and relations to model the document to guarantee the comprehensiveness and inner-content correlation.
For identifier construction, we map the events to well-defined event taxonomy to construct the identifiers with explicit semantic structure.
arXiv Detail & Related papers (2024-05-11T02:55:11Z) - FAMuS: Frames Across Multiple Sources [74.03795560933612]
FAMuS is a new corpus of Wikipedia passages that emphreport on some event, paired with underlying, genre-diverse (non-Wikipedia) emphsource articles for the same event.
We present results on two key event understanding tasks enabled by FAMuS.
arXiv Detail & Related papers (2023-11-09T18:57:39Z) - MEE: A Novel Multilingual Event Extraction Dataset [62.80569691825534]
Event Extraction aims to recognize event mentions and their arguments from text.
The lack of high-quality multilingual EE datasets for model training and evaluation has been the main hindrance.
We propose a novel Multilingual Event Extraction dataset (EE) that provides annotation for more than 50K event mentions in 8 typologically different languages.
arXiv Detail & Related papers (2022-11-11T02:01:41Z) - Joint Multimedia Event Extraction from Video and Article [51.159034070824056]
We propose the first approach to jointly extract events from video and text articles.
First, we propose the first self-supervised multimodal event coreference model.
Second, we introduce the first multimodal transformer which extracts structured event information jointly from both videos and text documents.
arXiv Detail & Related papers (2021-09-27T03:22:12Z) - Cross-document Event Identity via Dense Annotation [9.163142877146512]
We study the identity of textual events from different documents.
We propose a dense annotation approach for cross-document event coreference.
We present an open-access dataset for cross-document event coreference.
arXiv Detail & Related papers (2021-09-14T03:57:58Z) - COfEE: A Comprehensive Ontology for Event Extraction from text, with an
online annotation tool [3.8995911009078816]
Event Extraction (EE) seeks to derive information about specific incidents and their actors from the text.
EE is useful in many domains such as building a knowledge base, information retrieval, summarization and online monitoring systems.
COfEE consists of two hierarchy levels (event types and event sub-types) that include new categories relating to environmental issues, cyberspace, criminal activity and natural disasters.
arXiv Detail & Related papers (2021-07-21T19:43:22Z) - Document-level Event Extraction with Efficient End-to-end Learning of
Cross-event Dependencies [37.96254956540803]
We propose an end-to-end model leveraging Deep Value Networks (DVN), a structured prediction algorithm, to efficiently capture cross-event dependencies for document-level event extraction.
Our approach achieves comparable performance to CRF-based models on ACE05, while enjoys significantly higher computational efficiency.
arXiv Detail & Related papers (2020-10-24T05:28:16Z) - Detecting Ongoing Events Using Contextual Word and Sentence Embeddings [110.83289076967895]
This paper introduces the Ongoing Event Detection (OED) task.
The goal is to detect ongoing event mentions only, as opposed to historical, future, hypothetical, or other forms or events that are neither fresh nor current.
Any application that needs to extract structured information about ongoing events from unstructured texts can take advantage of an OED system.
arXiv Detail & Related papers (2020-07-02T20:44:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.