Needle in a Haystack: Label-Efficient Evaluation under Extreme Class
Imbalance
- URL: http://arxiv.org/abs/2006.06963v2
- Date: Wed, 2 Jun 2021 07:33:19 GMT
- Title: Needle in a Haystack: Label-Efficient Evaluation under Extreme Class
Imbalance
- Authors: Neil G. Marchant and Benjamin I. P. Rubinstein
- Abstract summary: This paper develops a framework for online evaluation based on adaptive importance sampling.
Experiments demonstrate an average MSE superior to state-of-the-art on fixed label budgets.
- Score: 20.491690754953943
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Important tasks like record linkage and extreme classification demonstrate
extreme class imbalance, with 1 minority instance to every 1 million or more
majority instances. Obtaining a sufficient sample of all classes, even just to
achieve statistically-significant evaluation, is so challenging that most
current approaches yield poor estimates or incur impractical cost. Where
importance sampling has been levied against this challenge, restrictive
constraints are placed on performance metrics, estimates do not come with
appropriate guarantees, or evaluations cannot adapt to incoming labels. This
paper develops a framework for online evaluation based on adaptive importance
sampling. Given a target performance metric and model for $p(y|x)$, the
framework adapts a distribution over items to label in order to maximize
statistical precision. We establish strong consistency and a central limit
theorem for the resulting performance estimates, and instantiate our framework
with worked examples that leverage Dirichlet-tree models. Experiments
demonstrate an average MSE superior to state-of-the-art on fixed label budgets.
Related papers
- Learning with Imbalanced Noisy Data by Preventing Bias in Sample
Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance.
We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z) - Weak Supervision Performance Evaluation via Partial Identification [46.73061437177238]
Programmatic Weak Supervision (PWS) enables supervised model training without direct access to ground truth labels.
We present a novel method to address this challenge by framing model evaluation as a partial identification problem.
Our approach derives reliable bounds on key metrics without requiring labeled data, overcoming core limitations in current weak supervision evaluation techniques.
arXiv Detail & Related papers (2023-12-07T07:15:11Z) - Fair Few-shot Learning with Auxiliary Sets [53.30014767684218]
In many machine learning (ML) tasks, only very few labeled data samples can be collected, which can lead to inferior fairness performance.
In this paper, we define the fairness-aware learning task with limited training samples as the emphfair few-shot learning problem.
We devise a novel framework that accumulates fairness-aware knowledge across different meta-training tasks and then generalizes the learned knowledge to meta-test tasks.
arXiv Detail & Related papers (2023-08-28T06:31:37Z) - Statistical Inference for Fairness Auditing [4.318555434063274]
We frame this task as "fairness auditing," in terms of multiple hypothesis testing.
We show how the bootstrap can be used to simultaneously bound performance disparities over a collection of groups.
Our methods can be used to flag subpopulations affected by model underperformance, and certify subpopulations for which the model performs adequately.
arXiv Detail & Related papers (2023-05-05T17:54:22Z) - Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label
Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data.
This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z) - SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised
Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation.
We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training.
In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z) - Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets.
To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data.
We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z) - Low-Shot Validation: Active Importance Sampling for Estimating
Classifier Performance on Rare Categories [47.050853657721596]
For machine learning models trained with limited labeled training data, validation stands to become the main bottleneck to reducing overall annotation costs.
We propose a statistical validation algorithm that accurately estimates the F-score of binary classifiers for rare categories.
In particular, we can estimate model F1 scores with a variance of 0.005 using as few as 100 labels.
arXiv Detail & Related papers (2021-09-13T06:01:16Z) - Exploiting Sample Uncertainty for Domain Adaptive Person
Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels.
Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z) - A Skew-Sensitive Evaluation Framework for Imbalanced Data Classification [11.125446871030734]
Class distribution skews in imbalanced datasets may lead to models with prediction bias towards majority classes.
We propose a simple and general-purpose evaluation framework for imbalanced data classification that is sensitive to arbitrary skews in class cardinalities and importances.
arXiv Detail & Related papers (2020-10-12T19:47:09Z) - Active Bayesian Assessment for Black-Box Classifiers [20.668691047355072]
We introduce an active Bayesian approach for assessment of classifier performance to satisfy the desiderata of both reliability and label-efficiency.
We first develop inference strategies to quantify uncertainty for common assessment metrics such as accuracy, misclassification cost, and calibration error.
We then propose a general framework for active Bayesian assessment using inferred uncertainty to guide efficient selection of instances for labeling.
arXiv Detail & Related papers (2020-02-16T08:08:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.