Qualitative and Quantitative Analysis of Diversity in Cross-document
Coreference Resolution Datasets
- URL: http://arxiv.org/abs/2109.05250v1
- Date: Sat, 11 Sep 2021 10:33:17 GMT
- Title: Qualitative and Quantitative Analysis of Diversity in Cross-document
Coreference Resolution Datasets
- Authors: Anastasia Zhukova, Felix Hamborg, and Bela Gipp
- Abstract summary: Cross-document coreference resolution (CDCR) datasets contain manually annotated event-centric mentions of events and entities that form coreference chains with identity relations.
ECB+ is a state-of-the-art CDCR dataset that focuses on the resolution of events and their descriptive attributes.
NewsWCL50 is a dataset that annotates coreference chains of both events and entities with a strong variance of word choice.
- Score: 9.379650501033465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Cross-document coreference resolution (CDCR) datasets, such as ECB+, contain
manually annotated event-centric mentions of events and entities that form
coreference chains with identity relations. ECB+ is a state-of-the-art CDCR
dataset that focuses on the resolution of events and their descriptive
attributes, i.e., actors, location, and date-time. NewsWCL50 is a dataset that
annotates coreference chains of both events and entities with a strong variance
of word choice and more loosely-related coreference anaphora, e.g., bridging or
near-identity relations. In this paper, we qualitatively and quantitatively
compare annotation schemes of ECB+ and NewsWCL50 with multiple criteria. We
propose a phrasing diversity metric (PD) that compares lexical diversity within
coreference chains on a more detailed level than previously proposed metric,
e.g., a number of unique lemmas. We discuss the different tasks that both CDCR
datasets create, i.e., lexical disambiguation and lexical diversity challenges,
and propose a direction for further CDCR evaluation.
Related papers
- Enhancing Cross-Document Event Coreference Resolution by Discourse Structure and Semantic Information [33.21818213257603]
Cross-document event coreference resolution models can only compute mention similarity directly or enhance mention representation by extracting event arguments.
We propose the construction of document-level Rhetorical Structure Theory (RST) trees and cross-document Lexical Chains to model the structural and semantic information of documents.
We have developed a large-scale Chinese cross-document event coreference dataset to fill this gap.
arXiv Detail & Related papers (2024-06-23T02:54:48Z) - Multimodal Cross-Document Event Coreference Resolution Using Linear Semantic Transfer and Mixed-Modality Ensembles [8.233126457964834]
Event coreference resolution (ECR) is the task of determining whether distinct mentions of events are actually linked to the same underlying occurrence.
Here, we propose a multimodal cross-document event coreference resolution method that integrates visual and textual cues with a simple linear map between vision and language models.
Our results demonstrate the utility of multimodal information in ECR for certain challenging coreference problems.
arXiv Detail & Related papers (2024-04-13T10:01:58Z) - mCL-NER: Cross-Lingual Named Entity Recognition via Multi-view
Contrastive Learning [54.523172171533645]
Cross-lingual named entity recognition (CrossNER) faces challenges stemming from uneven performance due to the scarcity of multilingual corpora.
We propose Multi-view Contrastive Learning for Cross-lingual Named Entity Recognition (mCL-NER)
Our experiments on the XTREME benchmark, spanning 40 languages, demonstrate the superiority of mCL-NER over prior data-driven and model-based approaches.
arXiv Detail & Related papers (2023-08-17T16:02:29Z) - Query Expansion Using Contextual Clue Sampling with Language Models [69.51976926838232]
We propose a combination of an effective filtering strategy and fusion of the retrieved documents based on the generation probability of each context.
Our lexical matching based approach achieves a similar top-5/top-20 retrieval accuracy and higher top-100 accuracy compared with the well-established dense retrieval model DPR.
For end-to-end QA, the reader model also benefits from our method and achieves the highest Exact-Match score against several competitive baselines.
arXiv Detail & Related papers (2022-10-13T15:18:04Z) - InfoCSE: Information-aggregated Contrastive Learning of Sentence
Embeddings [61.77760317554826]
This paper proposes an information-d contrastive learning framework for learning unsupervised sentence embeddings, termed InfoCSE.
We evaluate the proposed InfoCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task.
Experimental results show that InfoCSE outperforms SimCSE by an average Spearman correlation of 2.60% on BERT-base, and 1.77% on BERT-large.
arXiv Detail & Related papers (2022-10-08T15:53:19Z) - Learning Semantic Segmentation from Multiple Datasets with Label Shifts [101.24334184653355]
This paper proposes UniSeg, an effective approach to automatically train models across multiple datasets with differing label spaces.
Specifically, we propose two losses that account for conflicting and co-occurring labels to achieve better generalization performance in unseen domains.
arXiv Detail & Related papers (2022-02-28T18:55:19Z) - Author Clustering and Topic Estimation for Short Texts [69.54017251622211]
We propose a novel model that expands on the Latent Dirichlet Allocation by modeling strong dependence among the words in the same document.
We also simultaneously cluster users, removing the need for post-hoc cluster estimation.
Our method performs as well as -- or better -- than traditional approaches to problems arising in short text.
arXiv Detail & Related papers (2021-06-15T20:55:55Z) - SciCo: Hierarchical Cross-Document Coreference for Scientific Concepts [28.96683772139377]
We present a new task of hierarchical CDCR for concepts in scientific papers.
The goal is to jointly inferring coreference clusters and hierarchy between them.
We create SciCo, an expert-annotated dataset for this task, which is 3X larger than the prominent ECB+ resource.
arXiv Detail & Related papers (2021-04-18T10:42:20Z) - Sequential Cross-Document Coreference Resolution [14.099694053823765]
Cross-document coreference resolution is important for the growing interest in multi-document analysis tasks.
We propose a new model that extends the efficient sequential prediction paradigm for coreference resolution to cross-document settings.
Our model incrementally composes mentions into cluster representations and predicts links between a mention and the already constructed clusters.
arXiv Detail & Related papers (2021-04-17T00:46:57Z) - Generalizing Cross-Document Event Coreference Resolution Across Multiple
Corpora [63.429307282665704]
Cross-document event coreference resolution (CDCR) is an NLP task in which mentions of events need to be identified and clustered throughout a collection of documents.
CDCR aims to benefit downstream multi-document applications, but improvements from applying CDCR have not been shown yet.
We make the observation that every CDCR system to date was developed, trained, and tested only on a single respective corpus.
arXiv Detail & Related papers (2020-11-24T17:45:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.