Reliable Test-Time Adaptation via Agreement-on-the-Line
- URL: http://arxiv.org/abs/2310.04941v1
- Date: Sat, 7 Oct 2023 23:21:25 GMT
- Title: Reliable Test-Time Adaptation via Agreement-on-the-Line
- Authors: Eungyeup Kim, Mingjie Sun, Aditi Raghunathan, Zico Kolter
- Abstract summary: Test-time adaptation (TTA) methods aim to improve robustness to distribution shifts by adapting models using unlabeled data.
We make a notable and surprising observation that TTAed models strongly show the agreement-on-the-line phenomenon.
We leverage these observations to make TTA methods more reliable in three perspectives.
- Score: 26.40837283545848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Test-time adaptation (TTA) methods aim to improve robustness to distribution
shifts by adapting models using unlabeled data from the shifted test
distribution. However, there remain unresolved challenges that undermine the
reliability of TTA, which include difficulties in evaluating TTA performance,
miscalibration after TTA, and unreliable hyperparameter tuning for adaptation.
In this work, we make a notable and surprising observation that TTAed models
strongly show the agreement-on-the-line phenomenon (Baek et al., 2022) across a
wide range of distribution shifts. We find such linear trends occur
consistently in a wide range of models adapted with various hyperparameters,
and persist in distributions where the phenomenon fails to hold in vanilla
models (i.e., before adaptation). We leverage these observations to make TTA
methods more reliable in three perspectives: (i) estimating OOD accuracy
(without labeled data) to determine when TTA helps and when it hurts, (ii)
calibrating TTAed models without label information, and (iii) reliably
determining hyperparameters for TTA without any labeled validation data.
Through extensive experiments, we demonstrate that various TTA methods can be
precisely evaluated, both in terms of their improvements and degradations.
Moreover, our proposed methods on unsupervised calibration and hyperparameters
tuning for TTA achieve results close to the ones assuming access to
ground-truth labels, in terms of both OOD accuracy and calibration error.
Related papers
- AETTA: Label-Free Accuracy Estimation for Test-Time Adaptation [7.079932622432037]
Test-time adaptation (TTA) has emerged as a viable solution to adapt pre-trained models to domain shifts using unlabeled test data.
We propose AETTA, a label-free accuracy estimation algorithm for TTA.
We show that AETTA shows an average of 19.8%p more accurate estimation compared with the baselines.
arXiv Detail & Related papers (2024-04-01T04:21:49Z) - Test-time Adaptation Meets Image Enhancement: Improving Accuracy via Uncertainty-aware Logit Switching [7.837009376353597]
Test-time Adaptation(TTA) has been well studied because of its practicality.
We incorporate a new perspective on enhancing the input image into TTA methods to reduce the prediction's uncertainty.
We show that Test-time Enhancer and Adaptation(TECA) reduces prediction's uncertainty and increases accuracy of TTA methods.
arXiv Detail & Related papers (2024-03-26T06:40:03Z) - Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Persistent Test-time Adaptation in Recurring Testing Scenarios [12.024233973321756]
Current test-time adaptation (TTA) approaches aim to adapt a machine learning model to environments that change continuously.
Yet, it is unclear whether TTA methods can maintain their adaptability over prolonged periods.
We propose persistent TTA (PeTTA) which senses when the model is diverging towards collapse and adjusts the adaptation strategy.
arXiv Detail & Related papers (2023-11-30T02:24:44Z) - Diverse Data Augmentation with Diffusions for Effective Test-time Prompt
Tuning [73.75282761503581]
We propose DiffTPT, which leverages pre-trained diffusion models to generate diverse and informative new data.
Our experiments on test datasets with distribution shifts and unseen categories demonstrate that DiffTPT improves the zero-shot accuracy by an average of 5.13%.
arXiv Detail & Related papers (2023-08-11T09:36:31Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - Towards Stable Test-Time Adaptation in Dynamic Wild World [60.98073673220025]
Test-time adaptation (TTA) has shown to be effective at tackling distribution shifts between training and testing data by adapting a given model on test samples.
Online model updating of TTA may be unstable and this is often a key obstacle preventing existing TTA methods from being deployed in the real world.
arXiv Detail & Related papers (2023-02-24T02:03:41Z) - Robust Continual Test-time Adaptation: Instance-aware BN and
Prediction-balanced Memory [58.72445309519892]
We present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams.
Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner.
arXiv Detail & Related papers (2022-08-10T03:05:46Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.