Related papers: Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles

URL: http://arxiv.org/abs/2106.15728v4
Date: Sun, 14 May 2023 01:50:38 GMT
Title: Detecting Errors and Estimating Accuracy on Unlabeled Data with Self-training Ensembles
Authors: Jiefeng Chen, Frederick Liu, Besim Avci, Xi Wu, Yingyu Liang, Somesh Jha
Abstract summary: We propose a principled and practically effective framework that simultaneously addresses the two tasks. One instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%. On iWildCam, one instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7%.
Score: 38.23896575179384
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: When a deep learning model is deployed in the wild, it can encounter test data drawn from distributions different from the training data distribution and suffer drop in performance. For safe deployment, it is essential to estimate the accuracy of the pre-trained model on the test data. However, the labels for the test inputs are usually not immediately available in practice, and obtaining them can be expensive. This observation leads to two challenging tasks: (1) unsupervised accuracy estimation, which aims to estimate the accuracy of a pre-trained classifier on a set of unlabeled test inputs; (2) error detection, which aims to identify mis-classified test inputs. In this paper, we propose a principled and practically effective framework that simultaneously addresses the two tasks. The proposed framework iteratively learns an ensemble of models to identify mis-classified data points and performs self-training to improve the ensemble with the identified points. Theoretical analysis demonstrates that our framework enjoys provable guarantees for both accuracy estimation and error detection under mild conditions readily satisfied by practical deep learning models. Along with the framework, we proposed and experimented with two instantiations and achieved state-of-the-art results on 59 tasks. For example, on iWildCam, one instantiation reduces the estimation error for unsupervised accuracy estimation by at least 70% and improves the F1 score for error detection by at least 4.7% compared to existing methods.

Related papers

Consistency-based Abductive Reasoning over Perceptual Errors of Multiple Pre-trained Models in Novel Environments [5.5855749614100825]
This paper addresses the hypothesis that leveraging multiple pre-trained models can mitigate this recall reduction.<n>We formulate the challenge of identifying and managing conflicting predictions from various models as a consistency-based abduction problem.<n>Our results validate the use of consistency-based abduction as an effective mechanism to robustly integrate knowledge from multiple imperfect models in challenging, novel scenarios.
arXiv Detail & Related papers (2025-05-25T23:17:47Z)
Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications. We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z)
Leveraging Gradients for Unsupervised Accuracy Estimation under Distribution Shift [25.951051758560702]
Estimating test accuracy without access to the ground-truth test labels under varying test environments is a challenging, yet extremely important problem. We use the norm of classification-layer gradients, backpropagated from the cross-entropy loss after only one step over test data. Our key idea is that the model should be adjusted with a higher magnitude of gradients when it does not generalize to the test dataset with a distribution shift.
arXiv Detail & Related papers (2024-01-17T01:33:23Z)
Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative. We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z)
Estimating Model Performance under Domain Shifts with Class-Specific Confidence Scores [25.162667593654206]
We introduce class-wise calibration within the framework of performance estimation for imbalanced datasets. We conduct experiments on four tasks and find the proposed modifications consistently improve the estimation accuracy for imbalanced datasets.
arXiv Detail & Related papers (2022-07-20T15:04:32Z)
Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next. In such settings, there is a distinct type of distribution shift between the training and test data. We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Robust Fairness-aware Learning Under Sample Selection Bias [17.09665420515772]
We propose a framework for robust and fair learning under sample selection bias. We develop two algorithms to handle sample selection bias when test data is both available and unavailable.
arXiv Detail & Related papers (2021-05-24T23:23:36Z)
Uncertainty Estimation Using a Single Deep Deterministic Neural Network [66.26231423824089]
We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass. We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models.
arXiv Detail & Related papers (2020-03-04T12:27:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.