Streamlining Cross-Document Coreference Resolution: Evaluation and
Modeling
- URL: http://arxiv.org/abs/2009.11032v3
- Date: Fri, 23 Oct 2020 13:40:30 GMT
- Title: Streamlining Cross-Document Coreference Resolution: Evaluation and
Modeling
- Authors: Arie Cattan, Alon Eirew, Gabriel Stanovsky, Mandar Joshi, and Ido
Dagan
- Abstract summary: Recent evaluation protocols for Cross-document (CD) coreference resolution have often been inconsistent or lenient.
Our primary contribution is proposing a pragmatic evaluation methodology which assumes access to only raw text.
Our model adapts and extends recent neural models for within-document coreference resolution to address the CD coreference setting.
- Score: 25.94435242086499
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent evaluation protocols for Cross-document (CD) coreference resolution
have often been inconsistent or lenient, leading to incomparable results across
works and overestimation of performance. To facilitate proper future research
on this task, our primary contribution is proposing a pragmatic evaluation
methodology which assumes access to only raw text -- rather than assuming gold
mentions, disregards singleton prediction, and addresses typical targeted
settings in CD coreference resolution. Aiming to set baseline results for
future research that would follow our evaluation methodology, we build the
first end-to-end model for this task. Our model adapts and extends recent
neural models for within-document coreference resolution to address the CD
coreference setting, which outperforms state-of-the-art results by a
significant margin.
Related papers
- Deep Model Interpretation with Limited Data : A Coreset-based Approach [0.810304644344495]
We propose a coreset-based interpretation framework that utilizes coreset selection methods to sample a representative subset of the large dataset for the interpretation task.
We propose a similarity-based evaluation protocol to assess the robustness of model interpretation methods towards the amount data they take as input.
arXiv Detail & Related papers (2024-10-01T09:07:24Z) - Top-K Pairwise Ranking: Bridging the Gap Among Ranking-Based Measures for Multi-Label Classification [120.37051160567277]
This paper proposes a novel measure named Top-K Pairwise Ranking (TKPR)
A series of analyses show that TKPR is compatible with existing ranking-based measures.
On the other hand, we establish a sharp generalization bound for the proposed framework based on a novel technique named data-dependent contraction.
arXiv Detail & Related papers (2024-07-09T09:36:37Z) - Dense Retrieval with Continuous Explicit Feedback for Systematic Review Screening Prioritisation [28.80089773616623]
The goal of screening prioritisation in systematic reviews is to identify relevant documents with high recall and rank them in early positions for review.
Recent studies have shown that neural models have good potential on this task, but their time-consuming fine-tuning and inference discourage their widespread use for screening prioritisation.
We propose an alternative approach that still relies on neural models, but leverages dense representations and relevance feedback to enhance screening prioritisation.
arXiv Detail & Related papers (2024-06-30T09:25:42Z) - Coherent Entity Disambiguation via Modeling Topic and Categorical
Dependency [87.16283281290053]
Previous entity disambiguation (ED) methods adopt a discriminative paradigm, where prediction is made based on matching scores between mention context and candidate entities.
We propose CoherentED, an ED system equipped with novel designs aimed at enhancing the coherence of entity predictions.
We achieve new state-of-the-art results on popular ED benchmarks, with an average improvement of 1.3 F1 points.
arXiv Detail & Related papers (2023-11-06T16:40:13Z) - Investigating Crowdsourcing Protocols for Evaluating the Factual
Consistency of Summaries [59.27273928454995]
Current pre-trained models applied to summarization are prone to factual inconsistencies which misrepresent the source text or introduce extraneous information.
We create a crowdsourcing evaluation framework for factual consistency using the rating-based Likert scale and ranking-based Best-Worst Scaling protocols.
We find that ranking-based protocols offer a more reliable measure of summary quality across datasets, while the reliability of Likert ratings depends on the target dataset and the evaluation design.
arXiv Detail & Related papers (2021-09-19T19:05:00Z) - Realistic Evaluation Principles for Cross-document Coreference
Resolution [19.95214898312209]
We argue that models should not exploit the synthetic topic structure of the standard ECB+ dataset.
We demonstrate empirically the drastic impact of our more realistic evaluation principles on a competitive model.
arXiv Detail & Related papers (2021-06-08T09:05:21Z) - Cross-document Coreference Resolution over Predicted Mentions [19.95214898312209]
We introduce the first end-to-end model for CD coreference resolution from raw text.
Our model achieves competitive results for event and entity coreference resolution on gold mentions.
arXiv Detail & Related papers (2021-06-02T14:56:28Z) - Reliable Evaluations for Natural Language Inference based on a Unified
Cross-dataset Benchmark [54.782397511033345]
Crowd-sourced Natural Language Inference (NLI) datasets may suffer from significant biases like annotation artifacts.
We present a new unified cross-datasets benchmark with 14 NLI datasets and re-evaluate 9 widely-used neural network-based NLI models.
Our proposed evaluation scheme and experimental baselines could provide a basis to inspire future reliable NLI research.
arXiv Detail & Related papers (2020-10-15T11:50:12Z) - Evaluating Text Coherence at Sentence and Paragraph Levels [17.99797111176988]
We investigate the adaptation of existing sentence ordering methods to a paragraph ordering task.
We also compare the learnability and robustness of existing models by artificially creating mini datasets and noisy datasets.
We conclude that the recurrent graph neural network-based model is an optimal choice for coherence modeling.
arXiv Detail & Related papers (2020-06-05T03:31:49Z) - Pre-training Is (Almost) All You Need: An Application to Commonsense
Reasoning [61.32992639292889]
Fine-tuning of pre-trained transformer models has become the standard approach for solving common NLP tasks.
We introduce a new scoring method that casts a plausibility ranking task in a full-text format.
We show that our method provides a much more stable training phase across random restarts.
arXiv Detail & Related papers (2020-04-29T10:54:40Z) - Document Ranking with a Pretrained Sequence-to-Sequence Model [56.44269917346376]
We show how a sequence-to-sequence model can be trained to generate relevance labels as "target words"
Our approach significantly outperforms an encoder-only model in a data-poor regime.
arXiv Detail & Related papers (2020-03-14T22:29:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.