A Critical Assessment of State-of-the-Art in Entity Alignment
- URL: http://arxiv.org/abs/2010.16314v2
- Date: Wed, 17 Mar 2021 14:54:05 GMT
- Title: A Critical Assessment of State-of-the-Art in Entity Alignment
- Authors: Max Berrendorf and Ludwig Wacker and Evgeniy Faerman
- Abstract summary: We investigate two state-of-the-art (SotA) methods for the task of Entity Alignment in Knowledge Graphs.
We first carefully examine the benchmarking process and identify several shortcomings, which make the results reported in the original works not always comparable.
- Score: 1.7725414095035827
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this work, we perform an extensive investigation of two state-of-the-art
(SotA) methods for the task of Entity Alignment in Knowledge Graphs. Therefore,
we first carefully examine the benchmarking process and identify several
shortcomings, which make the results reported in the original works not always
comparable. Furthermore, we suspect that it is a common practice in the
community to make the hyperparameter optimization directly on a test set,
reducing the informative value of reported performance. Thus, we select a
representative sample of benchmarking datasets and describe their properties.
We also examine different initializations for entity representations since they
are a decisive factor for model performance. Furthermore, we use a shared
train/validation/test split for a fair evaluation setting in which we evaluate
all methods on all datasets. In our evaluation, we make several interesting
findings. While we observe that most of the time SotA approaches perform better
than baselines, they have difficulties when the dataset contains noise, which
is the case in most real-life applications. Moreover, we find out in our
ablation study that often different features of SotA methods are crucial for
good performance than previously assumed. The code is available at
https://github.com/mberr/ea-sota-comparison.
Related papers
- SureMap: Simultaneous Mean Estimation for Single-Task and Multi-Task Disaggregated Evaluation [75.56845750400116]
Disaggregated evaluation -- estimation of performance of a machine learning model on different subpopulations -- is a core task when assessing performance and group-fairness of AI systems.
We develop SureMap that has high estimation accuracy for both multi-task and single-task disaggregated evaluations of blackbox models.
Our method combines maximum a posteriori (MAP) estimation using a well-chosen prior together with cross-validation-free tuning via Stein's unbiased risk estimate (SURE)
arXiv Detail & Related papers (2024-11-14T17:53:35Z) - Bayesian Detector Combination for Object Detection with Crowdsourced Annotations [49.43709660948812]
Acquiring fine-grained object detection annotations in unconstrained images is time-consuming, expensive, and prone to noise.
We propose a novel Bayesian Detector Combination (BDC) framework to more effectively train object detectors with noisy crowdsourced annotations.
BDC is model-agnostic, requires no prior knowledge of the annotators' skill level, and seamlessly integrates with existing object detection models.
arXiv Detail & Related papers (2024-07-10T18:00:54Z) - Recent Advances in Named Entity Recognition: A Comprehensive Survey and Comparative Study [8.91661466156389]
We present an overview of recent popular approaches to NER.
We discuss reinforcement learning and graph-based approaches, highlighting their role in enhancing NER performance.
We evaluate the performance of the main NER implementations on a variety of datasets with differing characteristics.
arXiv Detail & Related papers (2024-01-19T17:21:05Z) - Benchmarking a Benchmark: How Reliable is MS-COCO? [0.0]
Sama-COCO, a re-annotation of MS-COCO, is used to discover potential biases by leveraging a shape analysis pipeline.
A model is trained and evaluated on both datasets to examine the impact of different annotation conditions.
arXiv Detail & Related papers (2023-11-05T16:55:40Z) - A Call to Reflect on Evaluation Practices for Age Estimation: Comparative Analysis of the State-of-the-Art and a Unified Benchmark [2.156208381257605]
We offer an extensive comparative analysis for state-of-the-art facial age estimation methods.
We find that the performance differences between the methods are negligible compared to the effect of other factors.
We propose using FaRL as the backbone model and demonstrate its effectiveness on all public datasets.
arXiv Detail & Related papers (2023-07-10T14:02:31Z) - Re-Evaluating LiDAR Scene Flow for Autonomous Driving [80.37947791534985]
Popular benchmarks for self-supervised LiDAR scene flow have unrealistic rates of dynamic motion, unrealistic correspondences, and unrealistic sampling patterns.
We evaluate a suite of top methods on a suite of real-world datasets.
We show that despite the emphasis placed on learning, most performance gains are caused by pre- and post-processing steps.
arXiv Detail & Related papers (2023-04-04T22:45:50Z) - Modeling Entities as Semantic Points for Visual Information Extraction
in the Wild [55.91783742370978]
We propose an alternative approach to precisely and robustly extract key information from document images.
We explicitly model entities as semantic points, i.e., center points of entities are enriched with semantic information describing the attributes and relationships of different entities.
The proposed method can achieve significantly enhanced performance on entity labeling and linking, compared with previous state-of-the-art models.
arXiv Detail & Related papers (2023-03-23T08:21:16Z) - CVTT: Cross-Validation Through Time [0.0]
We argue that leaving out a method's continuous performance can lead to losing valuable insight into joint data-method effects.
Using the proposed technique, we conduct a detailed analysis of popular RecSys algorithms' performance against various metrics and datasets.
Our results show that model performance can vary significantly over time, and both data and evaluation setup can have a marked effect on it.
arXiv Detail & Related papers (2022-05-11T10:30:38Z) - A Closer Look at Debiased Temporal Sentence Grounding in Videos:
Dataset, Metric, and Approach [53.727460222955266]
Temporal Sentence Grounding in Videos (TSGV) aims to ground a natural language sentence in an untrimmed video.
Recent studies have found that current benchmark datasets may have obvious moment annotation biases.
We introduce a new evaluation metric "dR@n,IoU@m" that discounts the basic recall scores to alleviate the inflating evaluation caused by biased datasets.
arXiv Detail & Related papers (2022-03-10T08:58:18Z) - On Modality Bias Recognition and Reduction [70.69194431713825]
We study the modality bias problem in the context of multi-modal classification.
We propose a plug-and-play loss function method, whereby the feature space for each label is adaptively learned.
Our method yields remarkable performance improvements compared with the baselines.
arXiv Detail & Related papers (2022-02-25T13:47:09Z) - On the Ambiguity of Rank-Based Evaluation of Entity Alignment or Link
Prediction Methods [27.27230441498167]
We take a closer look at the evaluation of two families of methods for enriching information from knowledge graphs: Link Prediction and Entity Alignment.
In particular, we demonstrate that all existing scores can hardly be used to compare results across different datasets.
We show that this leads to various problems in the interpretation of results, which may support misleading conclusions.
arXiv Detail & Related papers (2020-02-17T12:26:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.