Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information
- URL: http://arxiv.org/abs/2406.15990v1
- Date: Sun, 23 Jun 2024 02:54:48 GMT
- Title: Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information
- Authors: Qiang Gao, Bobo Li, Zixiang Meng, Yunlong Li, Jun Zhou, Fei Li, Chong Teng, Donghong Ji,
- Abstract summary: Cross-document event coreference resolution models can only compute mention similarity directly or enhance mention representation by extracting event arguments.
We propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents.
We have developed a large-scale Chinese cross-document event coreference dataset to fill this gap.
- Score: 33.21818213257603
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing cross-document event coreference resolution models, which either compute mention similarity directly or enhance mention representation by extracting event arguments (such as location, time, agent, and patient), lacking the ability to utilize document-level information. As a result, they struggle to capture long-distance dependencies. This shortcoming leads to their underwhelming performance in determining coreference for the events where their argument information relies on long-distance dependencies. In light of these limitations, we propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents. Subsequently, cross-document heterogeneous graphs are constructed and GAT is utilized to learn the representations of events. Finally, a pair scorer calculates the similarity between each pair of events and co-referred events can be recognized using standard clustering algorithm. Additionally, as the existing cross-document event coreference datasets are limited to English, we have developed a large-scale Chinese cross-document event coreference dataset to fill this gap, which comprises 53,066 event mentions and 4,476 clusters. After applying our model on the English and Chinese datasets respectively, it outperforms all baselines by large margins.
Related papers
- TacoERE: Cluster-aware Compression for Event Relation Extraction [47.89154684352463]
Event relation extraction is a critical and fundamental challenge for natural language processing.
We propose a cluster-aware compression method for improving event relation extraction (TacoERE)
arXiv Detail & Related papers (2024-05-11T03:06:08Z) - REXEL: An End-to-end Model for Document-Level Relation Extraction and Entity Linking [11.374031643273941]
REXEL is a highly efficient and accurate model for the joint task of document level cIE (DocIE)
It is on average 11 times faster than competitive existing approaches in a similar setting.
The combination of speed and accuracy makes REXEL an accurate cost-efficient system for extracting structured information at web-scale.
arXiv Detail & Related papers (2024-04-19T11:04:27Z) - Peek Across: Improving Multi-Document Modeling via Cross-Document
Question-Answering [49.85790367128085]
We pre-training a generic multi-document model from a novel cross-document question answering pre-training objective.
This novel multi-document QA formulation directs the model to better recover cross-text informational relations.
Unlike prior multi-document models that focus on either classification or summarization tasks, our pre-training objective formulation enables the model to perform tasks that involve both short text generation and long text generation.
arXiv Detail & Related papers (2023-05-24T17:48:40Z) - Semantic Structure Enhanced Event Causality Identification [57.26259734944247]
Event Causality Identification (ECI) aims to identify causal relations between events in unstructured texts.
Existing methods underestimate two kinds of semantic structures vital to the ECI task, namely, event-centric structure and event-associated structure.
arXiv Detail & Related papers (2023-05-22T07:42:35Z) - Document-level Relation Extraction with Cross-sentence Reasoning Graph [14.106582119686635]
Relation extraction (RE) has recently moved from the sentence-level to document-level.
We propose a novel document-level RE model with a GRaph information Aggregation and Cross-sentence Reasoning network (GRACR)
Experimental results show GRACR achieves excellent performance on two public datasets of document-level RE.
arXiv Detail & Related papers (2023-03-07T14:14:12Z) - Long Document Summarization with Top-down and Bottom-up Inference [113.29319668246407]
We propose a principled inference framework to improve summarization models on two aspects.
Our framework assumes a hierarchical latent structure of a document where the top-level captures the long range dependency.
We demonstrate the effectiveness of the proposed framework on a diverse set of summarization datasets.
arXiv Detail & Related papers (2022-03-15T01:24:51Z) - Focus on what matters: Applying Discourse Coherence Theory to Cross
Document Coreference [22.497877069528087]
Event and entity coreference resolution across documents vastly increases the number of candidate mentions, making it intractable to do the full $n2$ pairwise comparisons.
Existing approaches simplify by considering coreference only within document clusters, but this fails to handle inter-cluster coreference.
We draw on an insight from discourse coherence theory: potential coreferences are constrained by the reader's discourse focus.
Our approach achieves state-of-the-art results for both events and entities on the ECB+, Gun Violence, Football Coreference, and Cross-Domain Cross-Document Coreference corpora.
arXiv Detail & Related papers (2021-10-11T15:41:47Z) - Exploiting Global Contextual Information for Document-level Named Entity
Recognition [46.99922251839363]
We propose a model called Global Context enhanced Document-level NER (GCDoc)
At word-level, a document graph is constructed to model a wider range of dependencies between words.
At sentence-level, for appropriately modeling wider context beyond single sentence, we employ a cross-sentence module.
Our model reaches F1 score of 92.22 (93.40 with BERT) on CoNLL 2003 dataset and 88.32 (90.49 with BERT) on Ontonotes 5.0 dataset.
arXiv Detail & Related papers (2021-06-02T01:52:07Z) - Sequential Cross-Document Coreference Resolution [14.099694053823765]
Cross-document coreference resolution is important for the growing interest in multi-document analysis tasks.
We propose a new model that extends the efficient sequential prediction paradigm for coreference resolution to cross-document settings.
Our model incrementally composes mentions into cluster representations and predicts links between a mention and the already constructed clusters.
arXiv Detail & Related papers (2021-04-17T00:46:57Z) - WEC: Deriving a Large-scale Cross-document Event Coreference dataset
from Wikipedia [14.324743524196874]
We present Wikipedia Event Coreference (WEC), an efficient methodology for gathering a large-scale dataset for cross-document event coreference from Wikipedia.
We apply this methodology to the English Wikipedia and extract our large-scale WEC-Eng dataset.
We develop an algorithm that adapts components of state-of-the-art models for within-document coreference resolution to the cross-document setting.
arXiv Detail & Related papers (2021-04-11T14:54:35Z) - Pairwise Representation Learning for Event Coreference [73.10563168692667]
We develop a Pairwise Representation Learning (PairwiseRL) scheme for the event mention pairs.
Our representation supports a finer, structured representation of the text snippet to facilitate encoding events and their arguments.
We show that PairwiseRL, despite its simplicity, outperforms the prior state-of-the-art event coreference systems on both cross-document and within-document event coreference benchmarks.
arXiv Detail & Related papers (2020-10-24T06:55:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.