Related papers: A Learning Based Hypothesis Test for Harmful Covariate Shift

A Learning Based Hypothesis Test for Harmful Covariate Shift

URL: http://arxiv.org/abs/2212.02742v2
Date: Wed, 7 Dec 2022 03:19:24 GMT
Title: A Learning Based Hypothesis Test for Harmful Covariate Shift
Authors: Tom Ginsberg, Zhongyuan Liang, and Rahul G. Krishnan
Abstract summary: Machine learning systems in high-risk domains need to identify when predictions should not be made on out-of-distribution test examples. In this work, we use the discordance between an ensemble of classifiers trained to agree on training data and disagree on test data to determine when a model should be removed from the deployment setting.
Score: 3.1406146587437904
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The ability to quickly and accurately identify covariate shift at test time is a critical and often overlooked component of safe machine learning systems deployed in high-risk domains. While methods exist for detecting when predictions should not be made on out-of-distribution test examples, identifying distributional level differences between training and test time can help determine when a model should be removed from the deployment setting and retrained. In this work, we define harmful covariate shift (HCS) as a change in distribution that may weaken the generalization of a predictive model. To detect HCS, we use the discordance between an ensemble of classifiers trained to agree on training data and disagree on test data. We derive a loss function for training this ensemble and show that the disagreement rate and entropy represent powerful discriminative statistics for HCS. Empirically, we demonstrate the ability of our method to detect harmful covariate shift with statistical certainty on a variety of high-dimensional datasets. Across numerous domains and modalities, we show state-of-the-art performance compared to existing methods, particularly when the number of observed test samples is small.

Related papers

Conformal Segmentation in Industrial Surface Defect Detection with Statistical Guarantees [2.0257616108612373]
In industrial settings, surface defects on steel can significantly compromise its service life and elevate potential safety risks. Traditional defect detection methods predominantly rely on manual inspection, which suffers from low efficiency and high costs. We develop a statistically rigorous threshold based on a user-defined risk level to identify high-probability defective pixels in test images. We demonstrate robust and efficient control over the expected test set error rate across varying calibration-to-test ratios.
arXiv Detail & Related papers (2025-04-24T16:33:56Z)
Generalization and Informativeness of Weighted Conformal Risk Control Under Covariate Shift [40.43703709267958]
Weighted conformal risk control (W-CRC) uses data collected during the training phase to convert point predictions into prediction sets with valid risk guarantees at test time. While W-CRC provides statistical reliability, its efficiency -- measured by the size of the prediction sets -- can only be assessed at test time.
arXiv Detail & Related papers (2025-01-20T11:26:36Z)
Generalization vs. Specialization under Concept Shift [12.196508752999797]
Machine learning models are often brittle under distribution shift. We show that test performance can exhibit a nonmonotonic data dependence, even when double descent is absent. Experiments on MNIST and FashionMNIST suggest that this intriguing behavior is present also in classification problems.
arXiv Detail & Related papers (2024-09-23T22:30:28Z)
Invariant Anomaly Detection under Distribution Shifts: A Causal Perspective [6.845698872290768]
Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples. Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down. We attempt to increase the resilience of anomaly detection models to different kinds of distribution shifts.
arXiv Detail & Related papers (2023-12-21T23:20:47Z)
How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z)
Testing for Overfitting [0.0]
We discuss the overfitting problem and explain why standard and concentration results do not hold for evaluation with training data. We introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data.
arXiv Detail & Related papers (2023-05-09T22:49:55Z)
Uncertainty Modeling for Out-of-Distribution Generalization [56.957731893992495]
We argue that the feature statistics can be properly manipulated to improve the generalization ability of deep learning models. Common methods often consider the feature statistics as deterministic values measured from the learned features. We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
arXiv Detail & Related papers (2022-02-08T16:09:12Z)
Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next. In such settings, there is a distinct type of distribution shift between the training and test data. We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z)
Tracking the risk of a deployed model and detecting harmful distribution shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially. We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z)
Deep Learning in current Neuroimaging: a multivariate approach with power and type I error control but arguable generalization ability [0.158310730488265]
A non-parametric framework is proposed that estimates the statistical significance of classifications using deep learning architectures. A label permutation test is proposed in both studies using cross-validation (CV) and resubstitution with upper bound correction (RUB) as validation methods. We found in the permutation test that CV and RUB methods offer a false positive rate close to the significance level and an acceptable statistical power.
arXiv Detail & Related papers (2021-03-30T21:15:39Z)
Unsupervised neural adaptation model based on optimal transport for spoken language identification [54.96267179988487]
Due to the mismatch of statistical distributions of acoustic speech between training and testing sets, the performance of spoken language identification (SLID) could be drastically degraded. We propose an unsupervised neural adaptation model to deal with the distribution mismatch problem for SLID.
arXiv Detail & Related papers (2020-12-24T07:37:19Z)
Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle. In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize. Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.