A Learning Based Hypothesis Test for Harmful Covariate Shift
- URL: http://arxiv.org/abs/2212.02742v2
- Date: Wed, 7 Dec 2022 03:19:24 GMT
- Title: A Learning Based Hypothesis Test for Harmful Covariate Shift
- Authors: Tom Ginsberg, Zhongyuan Liang, and Rahul G. Krishnan
- Abstract summary: Machine learning systems in high-risk domains need to identify when predictions should not be made on out-of-distribution test examples.
In this work, we use the discordance between an ensemble of classifiers trained to agree on training data and disagree on test data to determine when a model should be removed from the deployment setting.
- Score: 3.1406146587437904
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The ability to quickly and accurately identify covariate shift at test time
is a critical and often overlooked component of safe machine learning systems
deployed in high-risk domains. While methods exist for detecting when
predictions should not be made on out-of-distribution test examples,
identifying distributional level differences between training and test time can
help determine when a model should be removed from the deployment setting and
retrained. In this work, we define harmful covariate shift (HCS) as a change in
distribution that may weaken the generalization of a predictive model. To
detect HCS, we use the discordance between an ensemble of classifiers trained
to agree on training data and disagree on test data. We derive a loss function
for training this ensemble and show that the disagreement rate and entropy
represent powerful discriminative statistics for HCS. Empirically, we
demonstrate the ability of our method to detect harmful covariate shift with
statistical certainty on a variety of high-dimensional datasets. Across
numerous domains and modalities, we show state-of-the-art performance compared
to existing methods, particularly when the number of observed test samples is
small.
Related papers
- Generalization vs. Specialization under Concept Shift [12.196508752999797]
Machine learning models are often brittle under distribution shift.
We show that test performance can exhibit a nonmonotonic data dependence, even when double descent is absent.
Experiments on MNIST and FashionMNIST suggest that this intriguing behavior is present also in classification problems.
arXiv Detail & Related papers (2024-09-23T22:30:28Z) - Invariant Anomaly Detection under Distribution Shifts: A Causal
Perspective [6.845698872290768]
Anomaly detection (AD) is the machine learning task of identifying highly discrepant abnormal samples.
Under the constraints of a distribution shift, the assumption that training samples and test samples are drawn from the same distribution breaks down.
We attempt to increase the resilience of anomaly detection models to different kinds of distribution shifts.
arXiv Detail & Related papers (2023-12-21T23:20:47Z) - How adversarial attacks can disrupt seemingly stable accurate classifiers [76.95145661711514]
Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data.
Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data.
We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability.
arXiv Detail & Related papers (2023-09-07T12:02:00Z) - Testing for Overfitting [0.0]
We discuss the overfitting problem and explain why standard and concentration results do not hold for evaluation with training data.
We introduce and argue for a hypothesis test by means of which both model performance may be evaluated using training data.
arXiv Detail & Related papers (2023-05-09T22:49:55Z) - Uncertainty Modeling for Out-of-Distribution Generalization [56.957731893992495]
We argue that the feature statistics can be properly manipulated to improve the generalization ability of deep learning models.
Common methods often consider the feature statistics as deterministic values measured from the learned features.
We improve the network generalization ability by modeling the uncertainty of domain shifts with synthesized feature statistics during training.
arXiv Detail & Related papers (2022-02-08T16:09:12Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - Tracking the risk of a deployed model and detecting harmful distribution
shifts [105.27463615756733]
In practice, it may make sense to ignore benign shifts, under which the performance of a deployed model does not degrade substantially.
We argue that a sensible method for firing off a warning has to both (a) detect harmful shifts while ignoring benign ones, and (b) allow continuous monitoring of model performance without increasing the false alarm rate.
arXiv Detail & Related papers (2021-10-12T17:21:41Z) - Deep Learning in current Neuroimaging: a multivariate approach with
power and type I error control but arguable generalization ability [0.158310730488265]
A non-parametric framework is proposed that estimates the statistical significance of classifications using deep learning architectures.
A label permutation test is proposed in both studies using cross-validation (CV) and resubstitution with upper bound correction (RUB) as validation methods.
We found in the permutation test that CV and RUB methods offer a false positive rate close to the significance level and an acceptable statistical power.
arXiv Detail & Related papers (2021-03-30T21:15:39Z) - Unsupervised neural adaptation model based on optimal transport for
spoken language identification [54.96267179988487]
Due to the mismatch of statistical distributions of acoustic speech between training and testing sets, the performance of spoken language identification (SLID) could be drastically degraded.
We propose an unsupervised neural adaptation model to deal with the distribution mismatch problem for SLID.
arXiv Detail & Related papers (2020-12-24T07:37:19Z) - Understanding Classifier Mistakes with Generative Models [88.20470690631372]
Deep neural networks are effective on supervised learning tasks, but have been shown to be brittle.
In this paper, we leverage generative models to identify and characterize instances where classifiers fail to generalize.
Our approach is agnostic to class labels from the training set which makes it applicable to models trained in a semi-supervised way.
arXiv Detail & Related papers (2020-10-05T22:13:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.