Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift
- URL: http://arxiv.org/abs/2006.10963v3
- Date: Thu, 14 Jan 2021 21:11:06 GMT
- Title: Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift
- Authors: Zachary Nado, Shreyas Padhy, D. Sculley, Alexander D'Amour, Balaji
Lakshminarayanan, Jasper Snoek
- Abstract summary: We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
- Score: 81.74795324629712
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Covariate shift has been shown to sharply degrade both predictive accuracy
and the calibration of uncertainty estimates for deep learning models. This is
worrying, because covariate shift is prevalent in a wide range of real world
deployment settings. However, in this paper, we note that frequently there
exists the potential to access small unlabeled batches of the shifted data just
before prediction time. This interesting observation enables a simple but
surprisingly effective method which we call prediction-time batch
normalization, which significantly improves model accuracy and calibration
under covariate shift. Using this one line code change, we achieve
state-of-the-art on recent covariate shift benchmarks and an mCE of 60.28\% on
the challenging ImageNet-C dataset; to our knowledge, this is the best result
for any model that does not incorporate additional data augmentation or
modification of the training pipeline. We show that prediction-time batch
normalization provides complementary benefits to existing state-of-the-art
approaches for improving robustness (e.g. deep ensembles) and combining the two
further improves performance. Our findings are supported by detailed
measurements of the effect of this strategy on model behavior across rigorous
ablations on various dataset modalities. However, the method has mixed results
when used alongside pre-training, and does not seem to perform as well under
more natural types of dataset shift, and is therefore worthy of additional
study. We include links to the data in our figures to improve reproducibility,
including a Python notebooks that can be run to easily modify our analysis at
https://colab.research.google.com/drive/11N0wDZnMQQuLrRwRoumDCrhSaIhkqjof.
Related papers
- Bridging Precision and Confidence: A Train-Time Loss for Calibrating
Object Detection [58.789823426981044]
We propose a novel auxiliary loss formulation that aims to align the class confidence of bounding boxes with the accurateness of predictions.
Our results reveal that our train-time loss surpasses strong calibration baselines in reducing calibration error for both in and out-domain scenarios.
arXiv Detail & Related papers (2023-03-25T08:56:21Z) - Sample-dependent Adaptive Temperature Scaling for Improved Calibration [95.7477042886242]
Post-hoc approach to compensate for neural networks being wrong is to perform temperature scaling.
We propose to predict a different temperature value for each input, allowing us to adjust the mismatch between confidence and accuracy.
We test our method on the ResNet50 and WideResNet28-10 architectures using the CIFAR10/100 and Tiny-ImageNet datasets.
arXiv Detail & Related papers (2022-07-13T14:13:49Z) - Using calibrator to improve robustness in Machine Reading Comprehension [18.844528744164876]
We propose a method to improve the robustness by using a calibrator as the post-hoc reranker.
Experimental results on adversarial datasets show that our model can achieve performance improvement by more than 10%.
arXiv Detail & Related papers (2022-02-24T02:16:42Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z) - Efficient remedies for outlier detection with variational autoencoders [8.80692072928023]
Likelihoods computed by deep generative models are a candidate metric for outlier detection with unlabeled data.
We show that a theoretically-grounded correction readily ameliorates a key bias with VAE likelihood estimates.
We also show that the variance of the likelihoods computed over an ensemble of VAEs also enables robust outlier detection.
arXiv Detail & Related papers (2021-08-19T16:00:58Z) - Backward-Compatible Prediction Updates: A Probabilistic Approach [12.049279991559091]
We formalize the Prediction Update Problem and present an efficient probabilistic approach as answer to the above questions.
In extensive experiments on standard classification benchmark data sets, we show that our method outperforms alternative strategies for backward-compatible prediction updates.
arXiv Detail & Related papers (2021-07-02T13:05:31Z) - Closer Look at the Uncertainty Estimation in Semantic Segmentation under
Distributional Shift [2.05617385614792]
Uncertainty estimation for the task of semantic segmentation is evaluated under a varying level of domain shift.
It was shown that simple color transformations already provide a strong baseline.
ensemble of models was utilized in the self-training setting to improve the pseudo-labels generation.
arXiv Detail & Related papers (2021-05-31T19:50:43Z) - Scalable Marginal Likelihood Estimation for Model Selection in Deep
Learning [78.83598532168256]
Marginal-likelihood based model-selection is rarely used in deep learning due to estimation difficulties.
Our work shows that marginal likelihoods can improve generalization and be useful when validation data is unavailable.
arXiv Detail & Related papers (2021-04-11T09:50:24Z) - Model adaptation and unsupervised learning with non-stationary batch
data under smooth concept drift [8.068725688880772]
Most predictive models assume that training and test data are generated from a stationary process.
We consider the scenario of a gradual concept drift due to the underlying non-stationarity of the data source.
We propose a novel, iterative algorithm for unsupervised adaptation of predictive models.
arXiv Detail & Related papers (2020-02-10T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.