Evaluating LLMs on Entity Disambiguation in Tables
- URL: http://arxiv.org/abs/2408.06423v3
- Date: Thu, 31 Oct 2024 18:43:01 GMT
- Title: Evaluating LLMs on Entity Disambiguation in Tables
- Authors: Federico Belotti, Fabio Dadda, Marco Cremaschi, Roberto Avogadro, Matteo Palmonari,
- Abstract summary: This work proposes an extensive evaluation of four STI SOTA approaches: Alligator (formerly s-elbat), Dagobah, TURL, and TableLlama.
We also include in the evaluation both GPT-4o and GPT-4o-mini, since they excel in various public benchmarks.
- Score: 0.9786690381850356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Tables are crucial containers of information, but understanding their meaning may be challenging. Over the years, there has been a surge in interest in data-driven approaches based on deep learning that have increasingly been combined with heuristic-based ones. In the last period, the advent of \acf{llms} has led to a new category of approaches for table annotation. However, these approaches have not been consistently evaluated on a common ground, making evaluation and comparison difficult. This work proposes an extensive evaluation of four STI SOTA approaches: Alligator (formerly s-elbat), Dagobah, TURL, and TableLlama; the first two belong to the family of heuristic-based algorithms, while the others are respectively encoder-only and decoder-only Large Language Models (LLMs). We also include in the evaluation both GPT-4o and GPT-4o-mini, since they excel in various public benchmarks. The primary objective is to measure the ability of these approaches to solve the entity disambiguation task with respect to both the performance achieved on a common-ground evaluation setting and the computational and cost requirements involved, with the ultimate aim of charting new research paths in the field.
Related papers
- CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution [74.41064280094064]
textbfJudger-1 is the first open-source textbfall-in-one judge LLM.
CompassJudger-1 is a general-purpose LLM that demonstrates remarkable versatility.
textbfJudgerBench is a new benchmark that encompasses various subjective evaluation tasks.
arXiv Detail & Related papers (2024-10-21T17:56:51Z) - Evaluating Generative Language Models in Information Extraction as Subjective Question Correction [49.729908337372436]
We propose a new evaluation method, SQC-Score.
Inspired by the principles in subjective question correction, we propose a new evaluation method, SQC-Score.
Results on three information extraction tasks show that SQC-Score is more preferred by human annotators than the baseline metrics.
arXiv Detail & Related papers (2024-04-04T15:36:53Z) - Low-shot Object Learning with Mutual Exclusivity Bias [27.67152913041082]
This paper introduces Low-shot Object Learning with Mutual Exclusivity Bias (LSME), the first computational framing of mutual exclusivity bias.
We provide a novel dataset, comprehensive baselines, and a state-of-the-art method to enable the ML community to tackle this challenging learning task.
arXiv Detail & Related papers (2023-12-06T14:54:10Z) - A Critical Re-evaluation of Benchmark Datasets for (Deep) Learning-Based
Matching Algorithms [11.264467955516706]
We propose four approaches to assessing the difficulty and appropriateness of 13 established datasets.
We show that most of the popular datasets pose rather easy classification tasks.
We propose a new methodology for yielding benchmark datasets.
arXiv Detail & Related papers (2023-07-03T07:54:54Z) - A RelEntLess Benchmark for Modelling Graded Relations between Named
Entities [29.528217625083546]
We introduce a new benchmark, in which entity pairs have to be ranked according to how much they satisfy a given graded relation.
We find a strong correlation between model size and performance, with smaller Language Models struggling to outperform a naive baseline.
The results of the largest Flan-T5 and OPT models are remarkably strong, although a clear gap with human performance remains.
arXiv Detail & Related papers (2023-05-24T10:41:24Z) - Deep Active Ensemble Sampling For Image Classification [8.31483061185317]
Active learning frameworks aim to reduce the cost of data annotation by actively requesting the labeling for the most informative data points.
Some proposed approaches include uncertainty-based techniques, geometric methods, implicit combination of uncertainty-based and geometric approaches.
We present an innovative integration of recent progress in both uncertainty-based and geometric frameworks to enable an efficient exploration/exploitation trade-off in sample selection strategy.
Our framework provides two advantages: (1) accurate posterior estimation, and (2) tune-able trade-off between computational overhead and higher accuracy.
arXiv Detail & Related papers (2022-10-11T20:20:20Z) - Bi-level Alignment for Cross-Domain Crowd Counting [113.78303285148041]
Current methods rely on external data for training an auxiliary task or apply an expensive coarse-to-fine estimation.
We develop a new adversarial learning based method, which is simple and efficient to apply.
We evaluate our approach on five real-world crowd counting benchmarks, where we outperform existing approaches by a large margin.
arXiv Detail & Related papers (2022-05-12T02:23:25Z) - TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for
Unsupervised Sentence Embedding Learning [53.32740707197856]
We present a new state-of-the-art unsupervised method based on pre-trained Transformers and Sequential Denoising Auto-Encoder (TSDAE)
It can achieve up to 93.1% of the performance of in-domain supervised approaches.
arXiv Detail & Related papers (2021-04-14T17:02:18Z) - Multitask Learning for Class-Imbalanced Discourse Classification [74.41900374452472]
We show that a multitask approach can improve 7% Micro F1-score upon current state-of-the-art benchmarks.
We also offer a comparative review of additional techniques proposed to address resource-poor problems in NLP.
arXiv Detail & Related papers (2021-01-02T07:13:41Z) - ALdataset: a benchmark for pool-based active learning [1.9308522511657449]
Active learning (AL) is a subfield of machine learning (ML) in which a learning algorithm could achieve good accuracy with less training samples by interactively querying a user/oracle to label new data points.
Pool-based AL is well-motivated in many ML tasks, where unlabeled data is abundant, but their labels are hard to obtain.
We present experiment results for various active learning strategies, both recently proposed and classic highly-cited methods, and draw insights from the results.
arXiv Detail & Related papers (2020-10-16T04:37:29Z) - Revisiting LSTM Networks for Semi-Supervised Text Classification via
Mixed Objective Function [106.69643619725652]
We develop a training strategy that allows even a simple BiLSTM model, when trained with cross-entropy loss, to achieve competitive results.
We report state-of-the-art results for text classification task on several benchmark datasets.
arXiv Detail & Related papers (2020-09-08T21:55:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.