Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
- URL: http://arxiv.org/abs/2402.02249v2
- Date: Thu, 17 Oct 2024 08:30:07 GMT
- Title: Don't Label Twice: Quantity Beats Quality when Comparing Binary Classifiers on a Budget
- Authors: Florian E. Dorner, Moritz Hardt,
- Abstract summary: It's common practice to aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote.
We prove a theorem that runs counter to conventional wisdom.
We discuss the implications of our work for the design of machine learning benchmarks.
- Score: 16.81162898745253
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We study how to best spend a budget of noisy labels to compare the accuracy of two binary classifiers. It's common practice to collect and aggregate multiple noisy labels for a given data point into a less noisy label via a majority vote. We prove a theorem that runs counter to conventional wisdom. If the goal is to identify the better of two classifiers, we show it's best to spend the budget on collecting a single label for more samples. Our result follows from a non-trivial application of Cram\'er's theorem, a staple in the theory of large deviations. We discuss the implications of our work for the design of machine learning benchmarks, where they overturn some time-honored recommendations. In addition, our results provide sample size bounds superior to what follows from Hoeffding's bound.
Related papers
- Bipartite Ranking From Multiple Labels: On Loss Versus Label Aggregation [66.28528968249255]
Bipartite ranking is a fundamental supervised learning problem, with the goal of learning a ranking over instances with maximal area under the ROC curve (AUC) against a single binary target label.
How can one synthesize such labels into a single coherent ranking?
We analyze two approaches to this problem -- loss aggregation and label aggregation -- by characterizing their Bayes-optimal solutions.
arXiv Detail & Related papers (2025-04-15T15:25:27Z) - Regularly Truncated M-estimators for Learning with Noisy Labels [79.36560434324586]
We propose regularly truncated M-estimators (RTME) to address the above two issues simultaneously.
Specifically, RTME can alternately switch modes between truncated M-estimators and original M-estimators.
We demonstrate that our strategies are label-noise-tolerant.
arXiv Detail & Related papers (2023-09-02T10:22:20Z) - Robust Assignment of Labels for Active Learning with Sparse and Noisy
Annotations [0.17188280334580192]
Supervised classification algorithms are used to solve a growing number of real-life problems around the globe.
Unfortunately, acquiring good-quality annotations for many tasks is infeasible or too expensive to be done in practice.
We propose two novel annotation unification algorithms that utilize unlabeled parts of the sample space.
arXiv Detail & Related papers (2023-07-25T19:40:41Z) - Crowdsourcing subjective annotations using pairwise comparisons reduces
bias and error compared to the majority-vote method [0.0]
We introduce a theoretical framework for understanding how random error and measurement bias enter into crowdsourced annotations of subjective constructs.
We then propose a pipeline that combines pairwise comparison labelling with Elo scoring, and demonstrate that it outperforms the ubiquitous majority-voting method in reducing both types of measurement error.
arXiv Detail & Related papers (2023-05-31T17:14:12Z) - To Aggregate or Not? Learning with Separate Noisy Labels [28.14966756980763]
This paper addresses the question of whether one should aggregate separate noisy labels into single ones or use them separately as given.
We theoretically analyze the performance of both approaches under the empirical risk minimization framework.
Our theorems conclude that label separation is preferred over label aggregation when the noise rates are high, or the number of labelers/annotations is insufficient.
arXiv Detail & Related papers (2022-06-14T21:32:26Z) - Quantity vs Quality: Investigating the Trade-Off between Sample Size and
Label Reliability [0.0]
We study learning in probabilistic domains where the learner may receive incorrect labels but can improve the reliability of labels by repeatedly sampling them.
We motivate this problem in an application to compare the strength of poker hands where the training signal depends on the hidden community cards.
We propose two different validation strategies; switching from lower to higher validations over the course of training and using chi-square statistics to approximate the confidence in obtained labels.
arXiv Detail & Related papers (2022-04-20T13:52:00Z) - Active Learning with Label Comparisons [41.82179028046654]
We show that finding the best of $k$ labels can be done with $k-1$ active queries.
Key element in our analysis is the "label neighborhood graph" of the true distribution.
arXiv Detail & Related papers (2022-04-10T12:13:46Z) - Learning with Noisy Labels by Targeted Relabeling [52.0329205268734]
Crowdsourcing platforms are often used to collect datasets for training deep neural networks.
We propose an approach which reserves a fraction of annotations to explicitly relabel highly probable labeling errors.
arXiv Detail & Related papers (2021-10-15T20:37:29Z) - Disentangling Sampling and Labeling Bias for Learning in Large-Output
Spaces [64.23172847182109]
We show that different negative sampling schemes implicitly trade-off performance on dominant versus rare labels.
We provide a unified means to explicitly tackle both sampling bias, arising from working with a subset of all labels, and labeling bias, which is inherent to the data due to label imbalance.
arXiv Detail & Related papers (2021-05-12T15:40:13Z) - Are Fewer Labels Possible for Few-shot Learning? [81.89996465197392]
Few-shot learning is challenging due to its very limited data and labels.
Recent studies in big transfer (BiT) show that few-shot learning can greatly benefit from pretraining on large scale labeled dataset in a different domain.
We propose eigen-finetuning to enable fewer shot learning by leveraging the co-evolution of clustering and eigen-samples in the finetuning.
arXiv Detail & Related papers (2020-12-10T18:59:29Z) - Pointwise Binary Classification with Pairwise Confidence Comparisons [97.79518780631457]
We propose pairwise comparison (Pcomp) classification, where we have only pairs of unlabeled data that we know one is more likely to be positive than the other.
We link Pcomp classification to noisy-label learning to develop a progressive URE and improve it by imposing consistency regularization.
arXiv Detail & Related papers (2020-10-05T09:23:58Z) - Class2Simi: A Noise Reduction Perspective on Learning with Noisy Labels [98.13491369929798]
We propose a framework called Class2Simi, which transforms data points with noisy class labels to data pairs with noisy similarity labels.
Class2Simi is computationally efficient because not only this transformation is on-the-fly in mini-batches, but also it just changes loss on top of model prediction into a pairwise manner.
arXiv Detail & Related papers (2020-06-14T07:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.