Related papers: Needle in a Haystack: Label-Efficient Evaluation under Extreme Class Imbalance

Needle in a Haystack: Label-Efficient Evaluation under Extreme Class Imbalance

URL: http://arxiv.org/abs/2006.06963v2
Date: Wed, 2 Jun 2021 07:33:19 GMT
Title: Needle in a Haystack: Label-Efficient Evaluation under Extreme Class Imbalance
Authors: Neil G. Marchant and Benjamin I. P. Rubinstein
Abstract summary: This paper develops a framework for online evaluation based on adaptive importance sampling. Experiments demonstrate an average MSE superior to state-of-the-art on fixed label budgets.
Score: 20.491690754953943
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Important tasks like record linkage and extreme classification demonstrate extreme class imbalance, with 1 minority instance to every 1 million or more majority instances. Obtaining a sufficient sample of all classes, even just to achieve statistically-significant evaluation, is so challenging that most current approaches yield poor estimates or incur impractical cost. Where importance sampling has been levied against this challenge, restrictive constraints are placed on performance metrics, estimates do not come with appropriate guarantees, or evaluations cannot adapt to incoming labels. This paper develops a framework for online evaluation based on adaptive importance sampling. Given a target performance metric and model for $p(y|x)$, the framework adapts a distribution over items to label in order to maximize statistical precision. We establish strong consistency and a central limit theorem for the resulting performance estimates, and instantiate our framework with worked examples that leverage Dirichlet-tree models. Experiments demonstrate an average MSE superior to state-of-the-art on fixed label budgets.

Related papers

Learning with Imbalanced Noisy Data by Preventing Bias in Sample Selection [82.43311784594384]
Real-world datasets contain not only noisy labels but also class imbalance. We propose a simple yet effective method to address noisy labels in imbalanced datasets.
arXiv Detail & Related papers (2024-02-17T10:34:53Z)
Weak Supervision Performance Evaluation via Partial Identification [46.73061437177238]
Programmatic Weak Supervision (PWS) enables supervised model training without direct access to ground truth labels. We present a novel method to address this challenge by framing model evaluation as a partial identification problem. Our approach derives reliable bounds on key metrics without requiring labeled data, overcoming core limitations in current weak supervision evaluation techniques.
arXiv Detail & Related papers (2023-12-07T07:15:11Z)
Fair Few-shot Learning with Auxiliary Sets [53.30014767684218]
In many machine learning (ML) tasks, only very few labeled data samples can be collected, which can lead to inferior fairness performance. In this paper, we define the fairness-aware learning task with limited training samples as the emphfair few-shot learning problem. We devise a novel framework that accumulates fairness-aware knowledge across different meta-training tasks and then generalizes the learned knowledge to meta-test tasks.
arXiv Detail & Related papers (2023-08-28T06:31:37Z)
Statistical Inference for Fairness Auditing [4.318555434063274]
We frame this task as "fairness auditing," in terms of multiple hypothesis testing. We show how the bootstrap can be used to simultaneously bound performance disparities over a collection of groups. Our methods can be used to flag subpopulations affected by model underperformance, and certify subpopulations for which the model performs adequately.
arXiv Detail & Related papers (2023-05-05T17:54:22Z)
Class-Distribution-Aware Pseudo Labeling for Semi-Supervised Multi-Label Learning [97.88458953075205]
Pseudo-labeling has emerged as a popular and effective approach for utilizing unlabeled data. This paper proposes a novel solution called Class-Aware Pseudo-Labeling (CAP) that performs pseudo-labeling in a class-aware manner.
arXiv Detail & Related papers (2023-05-04T12:52:18Z)
SoftMatch: Addressing the Quantity-Quality Trade-off in Semi-supervised Learning [101.86916775218403]
This paper revisits the popular pseudo-labeling methods via a unified sample weighting formulation. We propose SoftMatch to overcome the trade-off by maintaining both high quantity and high quality of pseudo-labels during training. In experiments, SoftMatch shows substantial improvements across a wide variety of benchmarks, including image, text, and imbalanced classification.
arXiv Detail & Related papers (2023-01-26T03:53:25Z)
Debiased Pseudo Labeling in Self-Training [77.83549261035277]
Deep neural networks achieve remarkable performances on a wide range of tasks with the aid of large-scale labeled datasets. To mitigate the requirement for labeled data, self-training is widely used in both academia and industry by pseudo labeling on readily-available unlabeled data. We propose Debiased, in which the generation and utilization of pseudo labels are decoupled by two independent heads.
arXiv Detail & Related papers (2022-02-15T02:14:33Z)
Low-Shot Validation: Active Importance Sampling for Estimating Classifier Performance on Rare Categories [47.050853657721596]
For machine learning models trained with limited labeled training data, validation stands to become the main bottleneck to reducing overall annotation costs. We propose a statistical validation algorithm that accurately estimates the F-score of binary classifiers for rare categories. In particular, we can estimate model F1 scores with a variance of 0.005 using as few as 100 labels.
arXiv Detail & Related papers (2021-09-13T06:01:16Z)
Exploiting Sample Uncertainty for Domain Adaptive Person Re-Identification [137.9939571408506]
We estimate and exploit the credibility of the assigned pseudo-label of each sample to alleviate the influence of noisy labels. Our uncertainty-guided optimization brings significant improvement and achieves the state-of-the-art performance on benchmark datasets.
arXiv Detail & Related papers (2020-12-16T04:09:04Z)
A Skew-Sensitive Evaluation Framework for Imbalanced Data Classification [11.125446871030734]
Class distribution skews in imbalanced datasets may lead to models with prediction bias towards majority classes. We propose a simple and general-purpose evaluation framework for imbalanced data classification that is sensitive to arbitrary skews in class cardinalities and importances.
arXiv Detail & Related papers (2020-10-12T19:47:09Z)
Active Bayesian Assessment for Black-Box Classifiers [20.668691047355072]
We introduce an active Bayesian approach for assessment of classifier performance to satisfy the desiderata of both reliability and label-efficiency. We first develop inference strategies to quantify uncertainty for common assessment metrics such as accuracy, misclassification cost, and calibration error. We then propose a general framework for active Bayesian assessment using inferred uncertainty to guide efficient selection of instances for labeling.
arXiv Detail & Related papers (2020-02-16T08:08:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.