Robustness Evaluation of Entity Disambiguation Using Prior Probes:the
Case of Entity Overshadowing
- URL: http://arxiv.org/abs/2108.10949v1
- Date: Tue, 24 Aug 2021 20:54:56 GMT
- Title: Robustness Evaluation of Entity Disambiguation Using Prior Probes:the
Case of Entity Overshadowing
- Authors: Vera Provatorova, Svitlana Vakulenko, Samarth Bhargav, Evangelos
Kanoulas
- Abstract summary: We evaluate and report the performance of popular entity linking systems on the ShadowLink benchmark.
Results show a considerable difference in accuracy between more and less common entities for all of the EL systems under evaluation.
- Score: 11.513083693564466
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Entity disambiguation (ED) is the last step of entity linking (EL), when
candidate entities are reranked according to the context they appear in. All
datasets for training and evaluating models for EL consist of convenience
samples, such as news articles and tweets, that propagate the prior probability
bias of the entity distribution towards more frequently occurring entities. It
was previously shown that the performance of the EL systems on such datasets is
overestimated since it is possible to obtain higher accuracy scores by merely
learning the prior. To provide a more adequate evaluation benchmark, we
introduce the ShadowLink dataset, which includes 16K short text snippets
annotated with entity mentions. We evaluate and report the performance of
popular EL systems on the ShadowLink benchmark. The results show a considerable
difference in accuracy between more and less common entities for all of the EL
systems under evaluation, demonstrating the effects of prior probability bias
and entity overshadowing.
Related papers
- Benchmark Inflation: Revealing LLM Performance Gaps Using Retro-Holdouts [0.6282171844772422]
Training data for many Large Language Models (LLMs) is contaminated with test data.
Public benchmark scores do not always accurately assess model properties.
arXiv Detail & Related papers (2024-10-11T20:46:56Z) - Real World Conversational Entity Linking Requires More Than Zeroshots [50.5691094768954]
We design targeted evaluation scenarios to measure the efficacy of EL models under resource constraints.
We assess EL models' ability to generalize to a new unfamiliar KB using Fandom and a novel zero-shot conversational entity linking dataset.
Results indicate that current zero-shot EL models falter when introduced to new, domain-specific KBs without prior training.
arXiv Detail & Related papers (2024-09-02T10:37:53Z) - VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs.
Existing benchmarks are often limited in scope, focusing mainly on object hallucinations.
We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z) - Entity Disambiguation via Fusion Entity Decoding [68.77265315142296]
We propose an encoder-decoder model to disambiguate entities with more detailed entity descriptions.
We observe +1.5% improvements in end-to-end entity linking in the GERBIL benchmark compared with EntQA.
arXiv Detail & Related papers (2024-04-02T04:27:54Z) - A Fair and In-Depth Evaluation of Existing End-to-End Entity Linking
Systems [4.4351901934764975]
evaluations of entity linking systems often say little about how the system is going to perform for a particular application.
We provide a more meaningful and fair in-depth evaluation of a variety of existing end-to-end entity linkers.
Our evaluation is based on several widely used benchmarks, which exhibit the problems mentioned above to various degrees, as well as on two new benchmarks.
arXiv Detail & Related papers (2023-05-24T09:20:15Z) - Focusing on Context is NICE: Improving Overshadowed Entity
Disambiguation [43.82625203429496]
NICE uses entity type information to leverage context and avoid over-relying on the frequency-based prior.
Our experiments show that NICE achieves the best performance results on the overshadowed entities while still performing competitively on the frequent entities.
arXiv Detail & Related papers (2022-10-12T13:05:37Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - Open-Set Recognition: A Good Closed-Set Classifier is All You Need [146.6814176602689]
We show that the ability of a classifier to make the 'none-of-above' decision is highly correlated with its accuracy on the closed-set classes.
We use this correlation to boost the performance of the cross-entropy OSR 'baseline' by improving its closed-set accuracy.
We also construct new benchmarks which better respect the task of detecting semantic novelty.
arXiv Detail & Related papers (2021-10-12T17:58:59Z) - Doing Great at Estimating CATE? On the Neglected Assumptions in
Benchmark Comparisons of Treatment Effect Estimators [91.3755431537592]
We show that even in arguably the simplest setting, estimation under ignorability assumptions can be misleading.
We consider two popular machine learning benchmark datasets for evaluation of heterogeneous treatment effect estimators.
We highlight that the inherent characteristics of the benchmark datasets favor some algorithms over others.
arXiv Detail & Related papers (2021-07-28T13:21:27Z) - A Critical Assessment of State-of-the-Art in Entity Alignment [1.7725414095035827]
We investigate two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs.
We first carefully examine the benchmarking process and identify several shortcomings, which make the results reported in the original works not always comparable.
arXiv Detail & Related papers (2020-10-30T15:09:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.