Better Aggregation in Test-Time Augmentation
- URL: http://arxiv.org/abs/2011.11156v2
- Date: Mon, 11 Oct 2021 19:58:48 GMT
- Title: Better Aggregation in Test-Time Augmentation
- Authors: Divya Shanmugam, Davis Blalock, Guha Balakrishnan, John Guttag
- Abstract summary: Test-time augmentation is the aggregation of predictions across transformed versions of a test input.
A key finding is that even when test-time augmentation produces a net improvement in accuracy, it can change many correct predictions into incorrect predictions.
We present a learning-based method for aggregating test-time augmentations.
- Score: 4.259219671110274
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Test-time augmentation -- the aggregation of predictions across transformed
versions of a test input -- is a common practice in image classification.
Traditionally, predictions are combined using a simple average. In this paper,
we present 1) experimental analyses that shed light on cases in which the
simple average is suboptimal and 2) a method to address these shortcomings. A
key finding is that even when test-time augmentation produces a net improvement
in accuracy, it can change many correct predictions into incorrect predictions.
We delve into when and why test-time augmentation changes a prediction from
being correct to incorrect and vice versa. Building on these insights, we
present a learning-based method for aggregating test-time augmentations.
Experiments across a diverse set of models, datasets, and augmentations show
that our method delivers consistent improvements over existing approaches.
Related papers
- Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Contextual Predictive Mutation Testing [17.832774161583036]
We introduce MutationBERT, an approach for predictive mutation testing that simultaneously encodes the source method mutation and test method.
Thanks to its higher precision, MutationBERT saves 33% of the time spent by a prior approach on checking/verifying live mutants.
We validate our input representation, and aggregation approaches for lifting predictions from the test matrix level to the test suite level, finding similar improvements in performance.
arXiv Detail & Related papers (2023-09-05T17:00:15Z) - Regularising for invariance to data augmentation improves supervised
learning [82.85692486314949]
We show that using multiple augmentations per input can improve generalisation.
We propose an explicit regulariser that encourages this invariance on the level of individual model predictions.
arXiv Detail & Related papers (2022-03-07T11:25:45Z) - Conformal prediction for the design problem [72.14982816083297]
In many real-world deployments of machine learning, we use a prediction algorithm to choose what data to test next.
In such settings, there is a distinct type of distribution shift between the training and test data.
We introduce a method to quantify predictive uncertainty in such settings.
arXiv Detail & Related papers (2022-02-08T02:59:12Z) - MEMO: Test Time Robustness via Adaptation and Augmentation [131.28104376280197]
We study the problem of test time robustification, i.e., using the test input to improve model robustness.
Recent prior works have proposed methods for test time adaptation, however, they each introduce additional assumptions.
We propose a simple approach that can be used in any test setting where the model is probabilistic and adaptable.
arXiv Detail & Related papers (2021-10-18T17:55:11Z) - How to Evaluate Uncertainty Estimates in Machine Learning for
Regression? [1.4610038284393165]
We show that both approaches to evaluating the quality of uncertainty estimates have serious flaws.
Firstly, both approaches cannot disentangle the separate components that jointly create the predictive uncertainty.
Thirdly, the current approach to test prediction intervals directly has additional flaws.
arXiv Detail & Related papers (2021-06-07T07:47:46Z) - Learning Loss for Test-Time Augmentation [25.739449801033846]
This paper proposes a novel instance-level test-time augmentation that efficiently selects suitable transformations for a test input.
Experimental results on several image classification benchmarks show that the proposed instance-aware test-time augmentation improves the model's robustness against various corruptions.
arXiv Detail & Related papers (2020-10-22T03:56:34Z) - Monotonicity in practice of adaptive testing [0.0]
This article evaluates Bayesian network models used for computerized adaptive testing and learned with a recently proposed monotonicity gradient algorithm.
The quality of methods is empirically evaluated on a large data set of the Czech National Mathematics exam.
arXiv Detail & Related papers (2020-09-15T10:55:41Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z) - Balance-Subsampled Stable Prediction [55.13512328954456]
We propose a novel balance-subsampled stable prediction (BSSP) algorithm based on the theory of fractional factorial design.
A design-theoretic analysis shows that the proposed method can reduce the confounding effects among predictors induced by the distribution shift.
Numerical experiments on both synthetic and real-world data sets demonstrate that our BSSP algorithm significantly outperforms the baseline methods for stable prediction across unknown test data.
arXiv Detail & Related papers (2020-06-08T07:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.