Related papers: Cross-document Event Identity via Dense Annotation

Cross-document Event Identity via Dense Annotation

URL: http://arxiv.org/abs/2109.06417v1
Date: Tue, 14 Sep 2021 03:57:58 GMT
Title: Cross-document Event Identity via Dense Annotation
Authors: Adithya Pratapa, Zhengzhong Liu, Kimihiro Hasegawa, Linwei Li, Yukari Yamakawa, Shikun Zhang, Teruko Mitamura
Abstract summary: We study the identity of textual events from different documents. We propose a dense annotation approach for cross-document event coreference. We present an open-access dataset for cross-document event coreference.
Score: 9.163142877146512
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: In this paper, we study the identity of textual events from different documents. While the complex nature of event identity is previously studied (Hovy et al., 2013), the case of events across documents is unclear. Prior work on cross-document event coreference has two main drawbacks. First, they restrict the annotations to a limited set of event types. Second, they insufficiently tackle the concept of event identity. Such annotation setup reduces the pool of event mentions and prevents one from considering the possibility of quasi-identity relations. We propose a dense annotation approach for cross-document event coreference, comprising a rich source of event mentions and a dense annotation effort between related document pairs. To this end, we design a new annotation workflow with careful quality control and an easy-to-use annotation interface. In addition to the links, we further collect overlapping event contexts, including time, location, and participants, to shed some light on the relation between identity decisions and context. We present an open-access dataset for cross-document event coreference, CDEC-WN, collected from English Wikinews and open-source our annotation toolkit to encourage further research on cross-document tasks.

Related papers

LegalCore: A Dataset for Event Coreference Resolution in Legal Documents [21.113915852038552]
We present the first dataset for the legal domain, LegalCore, annotated with comprehensive event and event coreference information. The legal contract documents we annotated in this dataset are several times longer than news articles, with an average length of around 25k tokens per document. We benchmark mainstream Large Language Models on this dataset for both event detection and event coreference resolution tasks.
arXiv Detail & Related papers (2025-02-18T03:47:53Z)
Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities. Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z)
Generative Retrieval Meets Multi-Graded Relevance [104.75244721442756]
We introduce a framework called GRaded Generative Retrieval (GR$2$) GR$2$ focuses on two key components: ensuring relevant and distinct identifiers, and implementing multi-graded constrained contrastive training. Experiments on datasets with both multi-graded and binary relevance demonstrate the effectiveness of GR$2$.
arXiv Detail & Related papers (2024-09-27T02:55:53Z)
Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm [33.737981167605575]
This paper proposes the task of cross-document event extraction (CDEE) to integrate event information from multiple documents and provide a comprehensive perspective on events. We construct a novel cross-document event extraction dataset, namely CLES, which contains 20,059 documents and 37,688 mention-level events. Our CDEE pipeline achieves about 72% F1 in end-to-end cross-document event extraction, suggesting the challenge of this task.
arXiv Detail & Related papers (2024-06-23T06:01:11Z)
Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information [33.21818213257603]
Cross-document event coreference resolution models can only compute mention similarity directly or enhance mention representation by extracting event arguments. We propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents. We have developed a large-scale Chinese cross-document event coreference dataset to fill this gap.
arXiv Detail & Related papers (2024-06-23T02:54:48Z)
Event GDR: Event-Centric Generative Document Retrieval [37.53593254200252]
We propose Event GDR, an event-centric generative document retrieval model. We employ events and relations to model the document to guarantee the comprehensiveness and inner-content correlation. For identifier construction, we map the events to well-defined event taxonomy to construct the identifiers with explicit semantic structure.
arXiv Detail & Related papers (2024-05-11T02:55:11Z)
FAMuS: Frames Across Multiple Sources [74.03795560933612]
FAMuS is a new corpus of Wikipedia passages that emphreport on some event, paired with underlying, genre-diverse (non-Wikipedia) emphsource articles for the same event. We present results on two key event understanding tasks enabled by FAMuS.
arXiv Detail & Related papers (2023-11-09T18:57:39Z)
Cross-document Event Coreference Search: Task, Dataset and Modeling [26.36068336169796]
We propose an appealing, and often more applicable, complementary set up for the task - Cross-document Coreference Search. To support research on this task, we create a corresponding dataset, which is derived from Wikipedia. We present a novel model that integrates a powerful coreference scoring scheme into the DPR architecture, yielding improved performance.
arXiv Detail & Related papers (2022-10-23T08:21:25Z)
Dynamic Global Memory for Document-level Argument Extraction [63.314514124716936]
We introduce a new global neural generation-based framework for document-level event argument extraction. We use a document memory store to record the contextual event information and leverage it to implicitly and explicitly help with decoding of arguments for later events. Empirical results show that our framework outperforms prior methods substantially.
arXiv Detail & Related papers (2022-09-18T23:45:25Z)
EA$^2$E: Improving Consistency with Event Awareness for Document-Level Argument Extraction [52.43978926985928]
We introduce the Event-Aware Argument Extraction (EA$2$E) model with augmented context for training and inference. Experiment results on WIKIEVENTS and ACE2005 datasets demonstrate the effectiveness of EA$2$E.
arXiv Detail & Related papers (2022-05-30T04:33:51Z)
Pairwise Representation Learning for Event Coreference [73.10563168692667]
We develop a Pairwise Representation Learning (PairwiseRL) scheme for the event mention pairs. Our representation supports a finer, structured representation of the text snippet to facilitate encoding events and their arguments. We show that PairwiseRL, despite its simplicity, outperforms the prior state-of-the-art event coreference systems on both cross-document and within-document event coreference benchmarks.
arXiv Detail & Related papers (2020-10-24T06:55:52Z)
Seeing the Forest and the Trees: Detection and Cross-Document Coreference Resolution of Militarized Interstate Disputes [3.8073142980733]
I provide a data set for evaluating methods to identify certain political events in text and to link related texts to one another based on shared events. The data set, Headlines of War, is built on the Militarized Interstate Disputes data set and offers headlines classified by dispute status and headline pairs labeled with coreference indicators. I introduce a model capable of accomplishing both tasks. The multi-task convolutional neural network is shown to be capable of recognizing events and event coreferences given the headlines' texts and publication dates.
arXiv Detail & Related papers (2020-05-06T17:20:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.