On the Ambiguity of Rank-Based Evaluation of Entity Alignment or Link
Prediction Methods
- URL: http://arxiv.org/abs/2002.06914v5
- Date: Tue, 19 Sep 2023 18:14:06 GMT
- Title: On the Ambiguity of Rank-Based Evaluation of Entity Alignment or Link
Prediction Methods
- Authors: Max Berrendorf and Evgeniy Faerman and Laurent Vermue and Volker Tresp
- Abstract summary: We take a closer look at the evaluation of two families of methods for enriching information from knowledge graphs: Link Prediction and Entity Alignment.
In particular, we demonstrate that all existing scores can hardly be used to compare results across different datasets.
We show that this leads to various problems in the interpretation of results, which may support misleading conclusions.
- Score: 27.27230441498167
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we take a closer look at the evaluation of two families of
methods for enriching information from knowledge graphs: Link Prediction and
Entity Alignment. In the current experimental setting, multiple different
scores are employed to assess different aspects of model performance. We
analyze the informativeness of these evaluation measures and identify several
shortcomings. In particular, we demonstrate that all existing scores can hardly
be used to compare results across different datasets. Moreover, we demonstrate
that varying size of the test size automatically has impact on the performance
of the same model based on commonly used metrics for the Entity Alignment task.
We show that this leads to various problems in the interpretation of results,
which may support misleading conclusions. Therefore, we propose adjustments to
the evaluation and demonstrate empirically how this supports a fair,
comparable, and interpretable assessment of model performance. Our code is
available at https://github.com/mberr/rank-based-evaluation.
Related papers
- Better Understanding Differences in Attribution Methods via Systematic Evaluations [57.35035463793008]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We use these evaluation schemes to study strengths and shortcomings of some widely used attribution methods over a wide range of models.
arXiv Detail & Related papers (2023-03-21T14:24:58Z) - Towards Better Understanding Attribution Methods [77.1487219861185]
Post-hoc attribution methods have been proposed to identify image regions most influential to the models' decisions.
We propose three novel evaluation schemes to more reliably measure the faithfulness of those methods.
We also propose a post-processing smoothing step that significantly improves the performance of some attribution methods.
arXiv Detail & Related papers (2022-05-20T20:50:17Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - A Fair and Comprehensive Comparison of Multimodal Tweet Sentiment
Analysis Methods [3.8142537449670963]
We present a comprehensive experimental evaluation and comparison with six state-of-the-art methods.
Results are presented for two different publicly available benchmark datasets of tweets and corresponding images.
arXiv Detail & Related papers (2021-06-16T14:44:48Z) - A Statistical Analysis of Summarization Evaluation Metrics using
Resampling Methods [60.04142561088524]
We find that the confidence intervals are rather wide, demonstrating high uncertainty in how reliable automatic metrics truly are.
Although many metrics fail to show statistical improvements over ROUGE, two recent works, QAEval and BERTScore, do in some evaluation settings.
arXiv Detail & Related papers (2021-03-31T18:28:14Z) - Interpretable Multi-dataset Evaluation for Named Entity Recognition [110.64368106131062]
We present a general methodology for interpretable evaluation for the named entity recognition (NER) task.
The proposed evaluation method enables us to interpret the differences in models and datasets, as well as the interplay between them.
By making our analysis tool available, we make it easy for future researchers to run similar analyses and drive progress in this area.
arXiv Detail & Related papers (2020-11-13T10:53:27Z) - A Critical Assessment of State-of-the-Art in Entity Alignment [1.7725414095035827]
We investigate two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs.
We first carefully examine the benchmarking process and identify several shortcomings, which make the results reported in the original works not always comparable.
arXiv Detail & Related papers (2020-10-30T15:09:19Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.