AQuA: A Benchmarking Tool for Label Quality Assessment
- URL: http://arxiv.org/abs/2306.09467v2
- Date: Tue, 16 Jan 2024 11:10:28 GMT
- Title: AQuA: A Benchmarking Tool for Label Quality Assessment
- Authors: Mononito Goswami, Vedant Sanil, Arjun Choudhry, Arvind Srinivasan,
Chalisa Udompanyawit, Artur Dubrawski
- Abstract summary: Recent studies have found datasets widely used to train and evaluate machine learning models to have pervasive labeling errors.
We propose a benchmarking environment AQuA to rigorously evaluate methods that enable machine learning in the presence of label noise.
- Score: 16.83510474053401
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Machine learning (ML) models are only as good as the data they are trained
on. But recent studies have found datasets widely used to train and evaluate ML
models, e.g. ImageNet, to have pervasive labeling errors. Erroneous labels on
the train set hurt ML models' ability to generalize, and they impact evaluation
and model selection using the test set. Consequently, learning in the presence
of labeling errors is an active area of research, yet this field lacks a
comprehensive benchmark to evaluate these methods. Most of these methods are
evaluated on a few computer vision datasets with significant variance in the
experimental protocols. With such a large pool of methods and inconsistent
evaluation, it is also unclear how ML practitioners can choose the right models
to assess label quality in their data. To this end, we propose a benchmarking
environment AQuA to rigorously evaluate methods that enable machine learning in
the presence of label noise. We also introduce a design space to delineate
concrete design choices of label error detection models. We hope that our
proposed design space and benchmark enable practitioners to choose the right
tools to improve their label quality and that our benchmark enables objective
and rigorous evaluation of machine learning tools facing mislabeled data.
Related papers
- Are LLMs Better than Reported? Detecting Label Errors and Mitigating Their Effect on Model Performance [21.926934384262594]
Large language models (LLMs) offer new opportunities to enhance the annotation process.
We compare expert, crowd-sourced, and our LLM-based annotations in terms of agreement, label quality, and efficiency.
Our findings reveal a substantial number of label errors, which, when corrected, induce a significant upward shift in reported model performance.
arXiv Detail & Related papers (2024-10-24T16:27:03Z) - Estimating label quality and errors in semantic segmentation data via
any model [19.84626033109009]
We study methods to score label quality, such that the images with the lowest scores are least likely to be correctly labeled.
This helps prioritize what data to review in order to ensure a high-quality training/evaluation dataset.
arXiv Detail & Related papers (2023-07-11T07:29:09Z) - Systematic analysis of the impact of label noise correction on ML
Fairness [0.0]
We develop an empirical methodology to evaluate the effectiveness of label noise correction techniques in ensuring the fairness of models trained on biased datasets.
Our results suggest that the Hybrid Label Noise Correction method achieves the best trade-off between predictive performance and fairness.
arXiv Detail & Related papers (2023-06-28T08:08:14Z) - Bring Your Own Data! Self-Supervised Evaluation for Large Language
Models [52.15056231665816]
We propose a framework for self-supervised evaluation of Large Language Models (LLMs)
We demonstrate self-supervised evaluation strategies for measuring closed-book knowledge, toxicity, and long-range context dependence.
We find strong correlations between self-supervised and human-supervised evaluations.
arXiv Detail & Related papers (2023-06-23T17:59:09Z) - Automated Labeling of German Chest X-Ray Radiology Reports using Deep
Learning [50.591267188664666]
We propose a deep learning-based CheXpert label prediction model, pre-trained on reports labeled by a rule-based German CheXpert model.
Our results demonstrate the effectiveness of our approach, which significantly outperformed the rule-based model on all three tasks.
arXiv Detail & Related papers (2023-06-09T16:08:35Z) - Label-Retrieval-Augmented Diffusion Models for Learning from Noisy
Labels [61.97359362447732]
Learning from noisy labels is an important and long-standing problem in machine learning for real applications.
In this paper, we reformulate the label-noise problem from a generative-model perspective.
Our model achieves new state-of-the-art (SOTA) results on all the standard real-world benchmark datasets.
arXiv Detail & Related papers (2023-05-31T03:01:36Z) - Learning to Detect Noisy Labels Using Model-Based Features [16.681748918518075]
We propose Selection-Enhanced Noisy label Training (SENT)
SENT does not rely on meta learning while having the flexibility of being data-driven.
It improves performance over strong baselines under the settings of self-training and label corruption.
arXiv Detail & Related papers (2022-12-28T10:12:13Z) - ALT-MAS: A Data-Efficient Framework for Active Testing of Machine
Learning Algorithms [58.684954492439424]
We propose a novel framework to efficiently test a machine learning model using only a small amount of labeled test data.
The idea is to estimate the metrics of interest for a model-under-test using Bayesian neural network (BNN)
arXiv Detail & Related papers (2021-04-11T12:14:04Z) - Active Testing: Sample-Efficient Model Evaluation [39.200332879659456]
We introduce active testing: a new framework for sample-efficient model evaluation.
Active testing addresses this by carefully selecting the test points to label.
We show how to remove that bias while reducing the variance of the estimator.
arXiv Detail & Related papers (2021-03-09T10:20:49Z) - Deep k-NN for Noisy Labels [55.97221021252733]
We show that a simple $k$-nearest neighbor-based filtering approach on the logit layer of a preliminary model can remove mislabeled data and produce more accurate models than many recently proposed methods.
arXiv Detail & Related papers (2020-04-26T05:15:36Z) - Learning with Out-of-Distribution Data for Audio Classification [60.48251022280506]
We show that detecting and relabelling certain OOD instances, rather than discarding them, can have a positive effect on learning.
The proposed method is shown to improve the performance of convolutional neural networks by a significant margin.
arXiv Detail & Related papers (2020-02-11T21:08:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.