A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based
Matching Algorithms
- URL: http://arxiv.org/abs/2307.01231v2
- Date: Mon, 13 Nov 2023 04:49:28 GMT
- Title: A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based
Matching Algorithms
- Authors: George Papadakis, Nishadi Kirielle, Peter Christen, Themis Palpanas
- Abstract summary: We propose four approaches to assessing the difficulty and appropriateness of 13 established datasets.
We show that most of the popular datasets pose rather easy classification tasks.
We propose a new methodology for yielding benchmark datasets.
- Score: 11.264467955516706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Entity resolution (ER) is the process of identifying records that refer to
the same entities within one or across multiple databases. Numerous techniques
have been developed to tackle ER challenges over the years, with recent
emphasis placed on machine and deep learning methods for the matching phase.
However, the quality of the benchmark datasets typically used in the
experimental evaluations of learning-based matching algorithms has not been
examined in the literature. To cover this gap, we propose four different
approaches to assessing the difficulty and appropriateness of 13 established
datasets: two theoretical approaches, which involve new measures of linearity
and existing measures of complexity, and two practical approaches: the
difference between the best non-linear and linear matchers, as well as the
difference between the best learning-based matcher and the perfect oracle. Our
analysis demonstrates that most of the popular datasets pose rather easy
classification tasks. As a result, they are not suitable for properly
evaluating learning-based matching algorithms. To address this issue, we
propose a new methodology for yielding benchmark datasets. We put it into
practice by creating four new matching tasks, and we verify that these new
benchmarks are more challenging and therefore more suitable for further
advancements in the field.
Related papers
- Evaluating LLMs on Entity Disambiguation in Tables [0.9786690381850356]
This work proposes an extensive evaluation of four STI SOTA approaches: Alligator (formerly s-elbat), Dagobah, TURL, and TableLlama.
We also include in the evaluation both GPT-4o and GPT-4o-mini, since they excel in various public benchmarks.
arXiv Detail & Related papers (2024-08-12T18:01:50Z) - Multivariate Time Series Anomaly Detection: Fancy Algorithms and Flawed
Evaluation Methodology [2.043517674271996]
We discuss how a normally good protocol may have weaknesses in the context of MVTS anomaly detection.
We propose a simple, yet challenging, baseline based on Principal Components Analysis (PCA) that surprisingly outperforms many recent Deep Learning (DL) based approaches on popular benchmark datasets.
arXiv Detail & Related papers (2023-08-24T20:24:12Z) - A Gold Standard Dataset for the Reviewer Assignment Problem [117.59690218507565]
"Similarity score" is a numerical estimate of the expertise of a reviewer in reviewing a paper.
Our dataset consists of 477 self-reported expertise scores provided by 58 researchers.
For the task of ordering two papers in terms of their relevance for a reviewer, the error rates range from 12%-30% in easy cases to 36%-43% in hard cases.
arXiv Detail & Related papers (2023-03-23T16:15:03Z) - Towards Diverse Evaluation of Class Incremental Learning: A Representation Learning Perspective [67.45111837188685]
Class incremental learning (CIL) algorithms aim to continually learn new object classes from incrementally arriving data.
We experimentally analyze neural network models trained by CIL algorithms using various evaluation protocols in representation learning.
arXiv Detail & Related papers (2022-06-16T11:44:11Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - Multiple-criteria Based Active Learning with Fixed-size Determinantal
Point Processes [43.71112693633952]
We introduce a multiple-criteria based active learning algorithm, which incorporates three complementary criteria, i.e., informativeness, representativeness and diversity.
We show that our method performs significantly better and is more stable than other multiple-criteria based AL algorithms.
arXiv Detail & Related papers (2021-07-04T13:22:54Z) - Can Active Learning Preemptively Mitigate Fairness Issues? [66.84854430781097]
dataset bias is one of the prevailing causes of unfairness in machine learning.
We study whether models trained with uncertainty-based ALs are fairer in their decisions with respect to a protected class.
We also explore the interaction of algorithmic fairness methods such as gradient reversal (GRAD) and BALD.
arXiv Detail & Related papers (2021-04-14T14:20:22Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z) - ALdataset: a benchmark for pool-based active learning [1.9308522511657449]
Active learning (AL) is a subfield of machine learning (ML) in which a learning algorithm could achieve good accuracy with less training samples by interactively querying a user/oracle to label new data points.
Pool-based AL is well-motivated in many ML tasks, where unlabeled data is abundant, but their labels are hard to obtain.
We present experiment results for various active learning strategies, both recently proposed and classic highly-cited methods, and draw insights from the results.
arXiv Detail & Related papers (2020-10-16T04:37:29Z) - Towards Model-Agnostic Post-Hoc Adjustment for Balancing Ranking
Fairness and Algorithm Utility [54.179859639868646]
Bipartite ranking aims to learn a scoring function that ranks positive individuals higher than negative ones from labeled data.
There have been rising concerns on whether the learned scoring function can cause systematic disparity across different protected groups.
We propose a model post-processing framework for balancing them in the bipartite ranking scenario.
arXiv Detail & Related papers (2020-06-15T10:08:39Z) - Fase-AL -- Adaptation of Fast Adaptive Stacking of Ensembles for
Supporting Active Learning [0.0]
This work presents the FASE-AL algorithm which induces classification models with non-labeled instances using Active Learning.
The algorithm achieves promising results in terms of the percentage of correctly classified instances.
arXiv Detail & Related papers (2020-01-30T17:25:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.