Streamlining Cross-Document Coreference Resolution: Evaluation and
Modeling
- URL: http://arxiv.org/abs/2009.11032v3
- Date: Fri, 23 Oct 2020 13:40:30 GMT
- Title: Streamlining Cross-Document Coreference Resolution: Evaluation and
Modeling
- Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, and Ido
Dagan
- Abstract summary: Recent evaluation protocols for Cross-document (CD) coreference resolution have often been inconsistent or lenient.
Our primary contribution is proposing a pragmatic evaluation methodology which assumes access to only raw text.
Our model adapts and extends recent neural models for within-document coreference resolution to address the CD coreference setting.
- Score: 25.94435242086499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent evaluation protocols for Cross-document (CD) coreference resolution
have often been inconsistent or lenient, leading to incomparable results across
works and overestimation of performance. To facilitate proper future research
on this task, our primary contribution is proposing a pragmatic evaluation
methodology which assumes access to only raw text -- rather than assuming gold
mentions, disregards singleton prediction, and addresses typical targeted
settings in CD coreference resolution. Aiming to set baseline results for
future research that would follow our evaluation methodology, we build the
first end-to-end model for this task. Our model adapts and extends recent
neural models for within-document coreference resolution to address the CD
coreference setting, which outperforms state-of-the-art results by a
significant margin.
Related papers
- Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR)
A series of analyses show that TKPR is compatible with existing ranking-based measures.
On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z) - Coherent Entity Disambiguation via Modeling Topic and Categorical
Dependency [87.16283281290053]
Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities.
We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions.
We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points.
arXiv Detail & Related papers (2023-11-06T16:40:13Z) - Investigating Crowdsourcing Protocols for Evaluating the Factual
Consistency of Summaries [59.27273928454995]
Current pre-trained models applied to summarization are prone to factual inconsistencies which misrepresent the source text or introduce extraneous information.
We create a crowdsourcing evaluation framework for factual consistency using the rating-based Likert scale and ranking-based Best-Worst Scaling protocols.
We find that ranking-based protocols offer a more reliable measure of summary quality across datasets, while the reliability of Likert ratings depends on the target dataset and the evaluation design.
arXiv Detail & Related papers (2021-09-19T19:05:00Z) - Realistic Evaluation Principles for Cross-document Coreference
Resolution [19.95214898312209]
We argue that models should not exploit the synthetic topic structure of the standard ECB+ dataset.
We demonstrate empirically the drastic impact of our more realistic evaluation principles on a competitive model.
arXiv Detail & Related papers (2021-06-08T09:05:21Z) - Cross-document Coreference Resolution over Predicted Mentions [19.95214898312209]
We introduce the first end-to-end model for CD coreference resolution from raw text.
Our model achieves competitive results for event and entity coreference resolution on gold mentions.
arXiv Detail & Related papers (2021-06-02T14:56:28Z) - Sequential Cross-Document Coreference Resolution [14.099694053823765]
Cross-document coreference resolution is important for the growing interest in multi-document analysis tasks.
We propose a new model that extends the efficient sequential prediction paradigm for coreference resolution to cross-document settings.
Our model incrementally composes mentions into cluster representations and predicts links between a mention and the already constructed clusters.
arXiv Detail & Related papers (2021-04-17T00:46:57Z) - Evaluation of Unsupervised Entity and Event Salience Estimation [17.74208462902158]
Salience Estimation aims to predict term importance in documents.
Previous studies typically generate pseudo-ground truth for evaluation.
In this work, we propose a light yet practical entity and event salience estimation evaluation protocol.
arXiv Detail & Related papers (2021-04-14T15:23:08Z) - Reliable Evaluations for Natural Language Inference based on a Unified
Cross-dataset Benchmark [54.782397511033345]
Crowd-sourced Natural Language Inference (NLI) datasets may suffer from significant biases like annotation artifacts.
We present a new unified cross-datasets benchmark with 14 NLI datasets and re-evaluate 9 widely-used neural network-based NLI models.
Our proposed evaluation scheme and experimental baselines could provide a basis to inspire future reliable NLI research.
arXiv Detail & Related papers (2020-10-15T11:50:12Z) - Evaluating Text Coherence at Sentence and Paragraph Levels [17.99797111176988]
We investigate the adaptation of existing sentence ordering methods to a paragraph ordering task.
We also compare the learnability and robustness of existing models by artificially creating mini datasets and noisy datasets.
We conclude that the recurrent graph neural network-based model is an optimal choice for coherence modeling.
arXiv Detail & Related papers (2020-06-05T03:31:49Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.