Cross-document Event Identity via Dense Annotation
- URL: http://arxiv.org/abs/2109.06417v1
- Date: Tue, 14 Sep 2021 03:57:58 GMT
- Title: Cross-document Event Identity via Dense Annotation
- Authors: Adithya Pratapa, Zhengzhong Liu, Kimihiro Hasegawa, Linwei Li, Yukari
Yamakawa, Shikun Zhang, Teruko Mitamura
- Abstract summary: We study the identity of textual events from different documents.
We propose a dense annotation approach for cross-document event coreference.
We present an open-access dataset for cross-document event coreference.
- Score: 9.163142877146512
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we study the identity of textual events from different
documents. While the complex nature of event identity is previously studied
(Hovy et al., 2013), the case of events across documents is unclear. Prior work
on cross-document event coreference has two main drawbacks. First, they
restrict the annotations to a limited set of event types. Second, they
insufficiently tackle the concept of event identity. Such annotation setup
reduces the pool of event mentions and prevents one from considering the
possibility of quasi-identity relations. We propose a dense annotation approach
for cross-document event coreference, comprising a rich source of event
mentions and a dense annotation effort between related document pairs. To this
end, we design a new annotation workflow with careful quality control and an
easy-to-use annotation interface. In addition to the links, we further collect
overlapping event contexts, including time, location, and participants, to shed
some light on the relation between identity decisions and context. We present
an open-access dataset for cross-document event coreference, CDEC-WN, collected
from English Wikinews and open-source our annotation toolkit to encourage
further research on cross-document tasks.
Related papers
- Unified Multi-Modal Interleaved Document Representation for Information Retrieval [57.65409208879344]
We produce more comprehensive and nuanced document representations by holistically embedding documents interleaved with different modalities.
Specifically, we achieve this by leveraging the capability of recent vision-language models that enable the processing and integration of text, images, and tables into a unified format and representation.
arXiv Detail & Related papers (2024-10-03T17:49:09Z) - Harvesting Events from Multiple Sources: Towards a Cross-Document Event Extraction Paradigm [33.737981167605575]
This paper proposes the task of cross-document event extraction (CDEE) to integrate event information from multiple documents and provide a comprehensive perspective on events.
We construct a novel cross-document event extraction dataset, namely CLES, which contains 20,059 documents and 37,688 mention-level events.
Our CDEE pipeline achieves about 72% F1 in end-to-end cross-document event extraction, suggesting the challenge of this task.
arXiv Detail & Related papers (2024-06-23T06:01:11Z) - Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information [33.21818213257603]
Cross-document event coreference resolution models can only compute mention similarity directly or enhance mention representation by extracting event arguments.
We propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents.
We have developed a large-scale Chinese cross-document event coreference dataset to fill this gap.
arXiv Detail & Related papers (2024-06-23T02:54:48Z) - Event GDR: Event-Centric Generative Document Retrieval [37.53593254200252]
We propose Event GDR, an event-centric generative document retrieval model.
We employ events and relations to model the document to guarantee the comprehensiveness and inner-content correlation.
For identifier construction, we map the events to well-defined event taxonomy to construct the identifiers with explicit semantic structure.
arXiv Detail & Related papers (2024-05-11T02:55:11Z) - FAMuS: Frames Across Multiple Sources [74.03795560933612]
FAMuS is a new corpus of Wikipedia passages that emphreport on some event, paired with underlying, genre-diverse (non-Wikipedia) emphsource articles for the same event.
We present results on two key event understanding tasks enabled by FAMuS.
arXiv Detail & Related papers (2023-11-09T18:57:39Z) - Cross-document Event Coreference Search: Task, Dataset and Modeling [26.36068336169796]
We propose an appealing, and often more applicable, complementary set up for the task - Cross-document Coreference Search.
To support research on this task, we create a corresponding dataset, which is derived from Wikipedia.
We present a novel model that integrates a powerful coreference scoring scheme into the DPR architecture, yielding improved performance.
arXiv Detail & Related papers (2022-10-23T08:21:25Z) - Dynamic Global Memory for Document-level Argument Extraction [63.314514124716936]
We introduce a new global neural generation-based framework for document-level event argument extraction.
We use a document memory store to record the contextual event information and leverage it to implicitly and explicitly help with decoding of arguments for later events.
Empirical results show that our framework outperforms prior methods substantially.
arXiv Detail & Related papers (2022-09-18T23:45:25Z) - EA$^2$E: Improving Consistency with Event Awareness for Document-Level
Argument Extraction [52.43978926985928]
We introduce the Event-Aware Argument Extraction (EA$2$E) model with augmented context for training and inference.
Experiment results on WIKIEVENTS and ACE2005 datasets demonstrate the effectiveness of EA$2$E.
arXiv Detail & Related papers (2022-05-30T04:33:51Z) - Unified Pretraining Framework for Document Understanding [52.224359498792836]
We present UDoc, a new unified pretraining framework for document understanding.
UDoc is designed to support most document understanding tasks, extending the Transformer to take multimodal embeddings as input.
An important feature of UDoc is that it learns a generic representation by making use of three self-supervised losses.
arXiv Detail & Related papers (2022-04-22T21:47:04Z) - Pairwise Representation Learning for Event Coreference [73.10563168692667]
We develop a Pairwise Representation Learning (PairwiseRL) scheme for the event mention pairs.
Our representation supports a finer, structured representation of the text snippet to facilitate encoding events and their arguments.
We show that PairwiseRL, despite its simplicity, outperforms the prior state-of-the-art event coreference systems on both cross-document and within-document event coreference benchmarks.
arXiv Detail & Related papers (2020-10-24T06:55:52Z) - Seeing the Forest and the Trees: Detection and Cross-Document
Coreference Resolution of Militarized Interstate Disputes [3.8073142980733]
I provide a data set for evaluating methods to identify certain political events in text and to link related texts to one another based on shared events.
The data set, Headlines of War, is built on the Militarized Interstate Disputes data set and offers headlines classified by dispute status and headline pairs labeled with coreference indicators.
I introduce a model capable of accomplishing both tasks. The multi-task convolutional neural network is shown to be capable of recognizing events and event coreferences given the headlines' texts and publication dates.
arXiv Detail & Related papers (2020-05-06T17:20:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.