Detecting Errors and Estimating Accuracy on Unlabeled Data with
Self-training Ensembles
- URL: http://arxiv.org/abs/2106.15728v4
- Date: Sun, 14 May 2023 01:50:38 GMT
- Title: Detecting Errors and Estimating Accuracy on Unlabeled Data with
Self-training Ensembles
- Authors: Jiefeng Chen, Frederick Liu, Besim Avci, Xi Wu, Yingyu Liang, Somesh
Jha
- Abstract summary: We propose a principled and practically effective framework that simultaneously addresses the two tasks.
One instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%.
On iWildCam, one instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%.
- Score: 38.23896575179384
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: When a deep learning model is deployed in the wild, it can encounter test
data drawn from distributions different from the training data distribution and
suffer drop in performance. For safe deployment, it is essential to estimate
the accuracy of the pre-trained model on the test data. However, the labels for
the test inputs are usually not immediately available in practice, and
obtaining them can be expensive. This observation leads to two challenging
tasks: (1) unsupervised accuracy estimation, which aims to estimate the
accuracy of a pre-trained classifier on a set of unlabeled test inputs; (2)
error detection, which aims to identify mis-classified test inputs. In this
paper, we propose a principled and practically effective framework that
simultaneously addresses the two tasks. The proposed framework iteratively
learns an ensemble of models to identify mis-classified data points and
performs self-training to improve the ensemble with the identified points.
Theoretical analysis demonstrates that our framework enjoys provable guarantees
for both accuracy estimation and error detection under mild conditions readily
satisfied by practical deep learning models. Along with the framework, we
proposed and experimented with two instantiations and achieved state-of-the-art
results on 59 tasks. For example, on iWildCam, one instantiation reduces the
estimation error for unsupervised accuracy estimation by at least 70% and
improves the F1 score for error detection by at least 4.7% compared to existing
methods.
Related papers
- Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Leveraging Gradients for Unsupervised Accuracy Estimation under
Distribution Shift [25.951051758560702]
Estimating test accuracy without access to the ground-truth test labels under varying test environments is a challenging, yet extremely important problem.
We use the norm of classification-layer gradients, backpropagated from the cross-entropy loss after only one step over test data.
Our key idea is that the model should be adjusted with a higher magnitude of gradients when it does not generalize to the test dataset with a distribution shift.
arXiv Detail & Related papers (2024-01-17T01:33:23Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - Estimating Model Performance under Domain Shifts with Class-Specific
Confidence Scores [25.162667593654206]
We introduce class-wise calibration within the framework of performance estimation for imbalanced datasets.
We conduct experiments on four tasks and find the proposed modifications consistently improve the estimation accuracy for imbalanced datasets.
arXiv Detail & Related papers (2022-07-20T15:04:32Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Robust Fairness-aware Learning Under Sample Selection Bias [17.09665420515772]
We propose a framework for robust and fair learning under sample selection bias.
We develop two algorithms to handle sample selection bias when test data is both available and unavailable.
arXiv Detail & Related papers (2021-05-24T23:23:36Z) - Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass.
We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.