Low-Shot Validation: Active Importance Sampling for Estimating
Classifier Performance on Rare Categories
- URL: http://arxiv.org/abs/2109.05720v1
- Date: Mon, 13 Sep 2021 06:01:16 GMT
- Title: Low-Shot Validation: Active Importance Sampling for Estimating
Classifier Performance on Rare Categories
- Authors: Fait Poms, Vishnu Sarukkai, Ravi Teja Mullapudi, Nimit S. Sohoni,
William R. Mark, Deva Ramanan, Kayvon Fatahalian
- Abstract summary: For machine learning models trained with limited labeled training data, validation stands to become the main bottleneck to reducing overall annotation costs.
We propose a statistical validation algorithm that accurately estimates the F-score of binary classifiers for rare categories.
In particular, we can estimate model F1 scores with a variance of 0.005 using as few as 100 labels.
- Score: 47.050853657721596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For machine learning models trained with limited labeled training data,
validation stands to become the main bottleneck to reducing overall annotation
costs. We propose a statistical validation algorithm that accurately estimates
the F-score of binary classifiers for rare categories, where finding relevant
examples to evaluate on is particularly challenging. Our key insight is that
simultaneous calibration and importance sampling enables accurate estimates
even in the low-sample regime (< 300 samples). Critically, we also derive an
accurate single-trial estimator of the variance of our method and demonstrate
that this estimator is empirically accurate at low sample counts, enabling a
practitioner to know how well they can trust a given low-sample estimate. When
validating state-of-the-art semi-supervised models on ImageNet and
iNaturalist2017, our method achieves the same estimates of model performance
with up to 10x fewer labels than competing approaches. In particular, we can
estimate model F1 scores with a variance of 0.005 using as few as 100 labels.
Related papers
- Auto-Evaluation with Few Labels through Post-hoc Regression [4.813376208491175]
Prediction Powered Inference (PPI) framework provides a way of leveraging statistical power of automatic evaluation and a small pool of labelled data.
We present two new PPI-based techniques that leverage robust regressors to produce even lower variance estimators in the few-label regime.
arXiv Detail & Related papers (2024-11-19T17:17:46Z) - Semi-supervised Learning For Robust Speech Evaluation [30.593420641501968]
Speech evaluation measures a learners oral proficiency using automatic models.
This paper proposes to address such challenges by exploiting semi-supervised pre-training and objective regularization.
An anchor model is trained using pseudo labels to predict the correctness of pronunciation.
arXiv Detail & Related papers (2024-09-23T02:11:24Z) - A Framework for Efficient Model Evaluation through Stratification, Sampling, and Estimation [17.351089059392674]
We propose a framework for model evaluation that includes stratification, sampling, and estimation components.
We show that stratification via k-means clustering based on accurate predictions of model performance yields efficient estimators.
We also find that model-assisted estimators, which leverage predictions of model accuracy on the unlabeled portion of the dataset, are generally more efficient than the traditional estimates.
arXiv Detail & Related papers (2024-06-11T14:49:04Z) - On Efficient and Statistical Quality Estimation for Data Annotation [11.216738303463751]
Annotated datasets are an essential ingredient to train, evaluate, compare and productionalize supervised machine learning models.
Quality estimation is often performed by having experts manually label instances as correct or incorrect.
Basing estimates on small sample sizes, however, can lead to imprecise values for the error rate.
We show that acceptance sampling can reduce the required sample sizes up to 50% while providing the same statistical guarantees.
arXiv Detail & Related papers (2024-05-20T09:57:29Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Detecting Errors and Estimating Accuracy on Unlabeled Data with
Self-training Ensembles [38.23896575179384]
We propose a principled and practically effective framework that simultaneously addresses the two tasks.
One instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%.
On iWildCam, one instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%.
arXiv Detail & Related papers (2021-06-29T21:32:51Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Multi-label Contrastive Predictive Coding [125.03510235962095]
Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC)
We introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify multiple positive samples at the same time.
We show that using the same amount of negative samples, multi-label CPC is able to exceed the $log m$ bound, while still being a valid lower bound of mutual information.
arXiv Detail & Related papers (2020-07-20T02:46:21Z) - Uncertainty-aware Self-training for Text Classification with Few Labels [54.13279574908808]
We study self-training as one of the earliest semi-supervised learning approaches to reduce the annotation bottleneck.
We propose an approach to improve self-training by incorporating uncertainty estimates of the underlying neural network.
We show our methods leveraging only 20-30 labeled samples per class for each task for training and for validation can perform within 3% of fully supervised pre-trained language models.
arXiv Detail & Related papers (2020-06-27T08:13:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.