XCoref: Cross-document Coreference Resolution in the Wild
- URL: http://arxiv.org/abs/2109.05252v1
- Date: Sat, 11 Sep 2021 10:41:09 GMT
- Title: XCoref: Cross-document Coreference Resolution in the Wild
- Authors: Anastasia Zhukova, Felix Hamborg, Karsten Donnay, and Bela Gipp
- Abstract summary: Bridging and loose coreference relations trigger associations that may expose news readers to bias by word choice and labeling.
A step towards bringing awareness of bias by word choice and labeling is the reliable resolution of coreferences with high lexical diversity.
We propose an unsupervised method named XCoref, which is a CDCR method that capably resolves entities, such as persons, "Donald Trump"
In an extensive evaluation, we compare the proposed XCoref to a state-of-the-art CDCR method and a previous method TCA that resolves such complex coreference relations.
- Score: 8.586057042714698
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Datasets and methods for cross-document coreference resolution (CDCR) focus
on events or entities with strict coreference relations. They lack, however,
annotating and resolving coreference mentions with more abstract or loose
relations that may occur when news articles report about controversial and
polarized events. Bridging and loose coreference relations trigger associations
that may lead to exposing news readers to bias by word choice and labeling. For
example, coreferential mentions of "direct talks between U.S. President Donald
Trump and Kim" such as "an extraordinary meeting following months of heated
rhetoric" or "great chance to solve a world problem" form a more positive
perception of this event. A step towards bringing awareness of bias by word
choice and labeling is the reliable resolution of coreferences with high
lexical diversity. We propose an unsupervised method named XCoref, which is a
CDCR method that capably resolves not only previously prevalent entities, such
as persons, e.g., "Donald Trump," but also abstractly defined concepts, such as
groups of persons, "caravan of immigrants," events and actions, e.g., "marching
to the U.S. border." In an extensive evaluation, we compare the proposed XCoref
to a state-of-the-art CDCR method and a previous method TCA that resolves such
complex coreference relations and find that XCoref outperforms these methods.
Outperforming an established CDCR model shows that the new CDCR models need to
be evaluated on semantically complex mentions with more loose coreference
relations to indicate their applicability of models to resolve mentions in the
"wild" of political news articles.
Related papers
- Diverse Word Choices, Same Reference: Annotating Lexically-Rich Cross-Document Coreference [6.567749530541648]
Cross-document coreference resolution (CDCR) identifies and links mentions of the same entities and events across related documents.<n>This paper proposes a revised CDCR annotation scheme of the NewsWCL50 dataset, treating coreference chains as discourse elements (DEs) and conceptual units of analysis.
arXiv Detail & Related papers (2026-02-19T14:56:01Z) - ABCD-LINK: Annotation Bootstrapping for Cross-Document Fine-Grained Links [57.514511353084565]
We introduce a new domain-agnostic framework for selecting a best-performing approach and annotating cross-document links.<n>We apply our framework in two distinct domains -- peer review and news.<n>The resulting novel datasets lay foundation for numerous cross-document tasks like media framing and peer review.
arXiv Detail & Related papers (2025-09-01T11:32:24Z) - Argument-Centric Causal Intervention Method for Mitigating Bias in Cross-Document Event Coreference Resolution [12.185497507437555]
Cross-document Event Coreference Resolution (CD-ECR) seeks to determine whether event mentions across multiple documents refer to the same real-world occurrence.<n>We propose a novel method based on Argument-Centric Causal Intervention (ACCI)<n>ACCI integrates a counterfactual reasoning module that quantifies the causal influence of trigger word perturbations, and an argument-aware enhancement module to promote greater sensitivity to semantically grounded information.
arXiv Detail & Related papers (2025-06-02T09:46:59Z) - Noisy-Correspondence Learning for Text-to-Image Person Re-identification [50.07634676709067]
We propose a novel Robust Dual Embedding method (RDE) to learn robust visual-semantic associations even with noisy correspondences.
Our method achieves state-of-the-art results both with and without synthetic noisy correspondences on three datasets.
arXiv Detail & Related papers (2023-08-19T05:34:13Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z) - Focus on what matters: Applying Discourse Coherence Theory to Cross
Document Coreference [22.497877069528087]
Event and entity coreference resolution across documents vastly increases the number of candidate mentions, making it intractable to do the full $n2$ pairwise comparisons.
Existing approaches simplify by considering coreference only within document clusters, but this fails to handle inter-cluster coreference.
We draw on an insight from discourse coherence theory: potential coreferences are constrained by the reader's discourse focus.
Our approach achieves state-of-the-art results for both events and entities on the ECB+, Gun Violence, Football Coreference, and Cross-Domain Cross-Document Coreference corpora.
arXiv Detail & Related papers (2021-10-11T15:41:47Z) - Cross-document Coreference Resolution over Predicted Mentions [19.95214898312209]
We introduce the first end-to-end model for CD coreference resolution from raw text.
Our model achieves competitive results for event and entity coreference resolution on gold mentions.
arXiv Detail & Related papers (2021-06-02T14:56:28Z) - SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts [28.96683772139377]
We present a new task of hierarchical CDCR for concepts in scientific papers.
The goal is to jointly inferring coreference clusters and hierarchy between them.
We create SciCo, an expert-annotated dataset for this task, which is 3X larger than the prominent ECB+ resource.
arXiv Detail & Related papers (2021-04-18T10:42:20Z) - CD2CR: Co-reference Resolution Across Documents and Domains [20.30046972135548]
Cross-document co-reference resolution (CDCR) is the task of identifying and linking mentions to entities and concepts across many text documents.
We propose a new task and English language dataset for cross-document cross-domain co-reference resolution (CD$2$CR)
We show that in this cross-domain, cross-document setting, existing CDCR models do not perform well and we provide a baseline model that outperforms current state-of-the-art CDCR models on CD$2$CR.
arXiv Detail & Related papers (2021-01-29T15:18:30Z) - Coreference Reasoning in Machine Reading Comprehension [100.75624364257429]
We show that coreference reasoning in machine reading comprehension is a greater challenge than was earlier thought.
We propose a methodology for creating reading comprehension datasets that better reflect the challenges of coreference reasoning.
This allows us to show an improvement in the reasoning abilities of state-of-the-art models across various MRC datasets.
arXiv Detail & Related papers (2020-12-31T12:18:41Z) - Generalizing Cross-Document Event Coreference Resolution Across Multiple
Corpora [63.429307282665704]
Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents.
CDCR aims to benefit downstream multi-document applications, but improvements from applying CDCR have not been shown yet.
We make the observation that every CDCR system to date was developed, trained, and tested only on a single respective corpus.
arXiv Detail & Related papers (2020-11-24T17:45:03Z) - A Brief Survey and Comparative Study of Recent Development of Pronoun
Coreference Resolution [55.39835612617972]
Pronoun Coreference Resolution (PCR) is the task of resolving pronominal expressions to all mentions they refer to.
As one important natural language understanding (NLU) component, pronoun resolution is crucial for many downstream tasks and still challenging for existing models.
We conduct extensive experiments to show that even though current models are achieving good performance on the standard evaluation set, they are still not ready to be used in real applications.
arXiv Detail & Related papers (2020-09-27T01:40:01Z) - Streamlining Cross-Document Coreference Resolution: Evaluation and
Modeling [25.94435242086499]
Recent evaluation protocols for Cross-document (CD) coreference resolution have often been inconsistent or lenient.
Our primary contribution is proposing a pragmatic evaluation methodology which assumes access to only raw text.
Our model adapts and extends recent neural models for within-document coreference resolution to address the CD coreference setting.
arXiv Detail & Related papers (2020-09-23T10:02:10Z) - Query Resolution for Conversational Search with Limited Supervision [63.131221660019776]
We propose QuReTeC (Query Resolution by Term Classification), a neural query resolution model based on bidirectional transformers.
We show that QuReTeC outperforms state-of-the-art models, and furthermore, that our distant supervision method can be used to substantially reduce the amount of human-curated data required to train QuReTeC.
arXiv Detail & Related papers (2020-05-24T11:37:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.