Towards Stable Test-Time Adaptation in Dynamic Wild World
- URL: http://arxiv.org/abs/2302.12400v1
- Date: Fri, 24 Feb 2023 02:03:41 GMT
- Title: Towards Stable Test-Time Adaptation in Dynamic Wild World
- Authors: Shuaicheng Niu, Jiaxiang Wu, Yifan Zhang, Zhiquan Wen, Yaofo Chen,
Peilin Zhao, Mingkui Tan
- Abstract summary: Test-time adaptation (TTA) has shown to be effective at tackling distribution shifts between training and testing data by adapting a given model on test samples.
Online model updating of TTA may be unstable and this is often a key obstacle preventing existing TTA methods from being deployed in the real world.
- Score: 60.98073673220025
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Test-time adaptation (TTA) has shown to be effective at tackling distribution
shifts between training and testing data by adapting a given model on test
samples. However, the online model updating of TTA may be unstable and this is
often a key obstacle preventing existing TTA methods from being deployed in the
real world. Specifically, TTA may fail to improve or even harm the model
performance when test data have: 1) mixed distribution shifts, 2) small batch
sizes, and 3) online imbalanced label distribution shifts, which are quite
common in practice. In this paper, we investigate the unstable reasons and find
that the batch norm layer is a crucial factor hindering TTA stability.
Conversely, TTA can perform more stably with batch-agnostic norm layers, \ie,
group or layer norm. However, we observe that TTA with group and layer norms
does not always succeed and still suffers many failure cases. By digging into
the failure cases, we find that certain noisy test samples with large gradients
may disturb the model adaption and result in collapsed trivial solutions, \ie,
assigning the same class label for all samples. To address the above collapse
issue, we propose a sharpness-aware and reliable entropy minimization method,
called SAR, for further stabilizing TTA from two aspects: 1) remove partial
noisy samples with large gradients, 2) encourage model weights to go to a flat
minimum so that the model is robust to the remaining noisy samples. Promising
results demonstrate that SAR performs more stably over prior methods and is
computationally efficient under the above wild test scenarios.
Related papers
- ETAGE: Enhanced Test Time Adaptation with Integrated Entropy and Gradient Norms for Robust Model Performance [18.055032898349438]
Test time adaptation (TTA) equips deep learning models to handle unseen test data that deviates from the training distribution.
We introduce ETAGE, a refined TTA method that integrates entropy minimization with gradient norms and PLPD.
Our method prioritizes samples that are less likely to cause instability by combining high entropy with high gradient norms out of adaptation.
arXiv Detail & Related papers (2024-09-14T01:25:52Z) - Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Persistent Test-time Adaptation in Recurring Testing Scenarios [12.024233973321756]
Current test-time adaptation (TTA) approaches aim to adapt a machine learning model to environments that change continuously.
Yet, it is unclear whether TTA methods can maintain their adaptability over prolonged periods.
We propose persistent TTA (PeTTA) which senses when the model is diverging towards collapse and adjusts the adaptation strategy.
arXiv Detail & Related papers (2023-11-30T02:24:44Z) - SoTTA: Robust Test-Time Adaptation on Noisy Data Streams [9.490557638324084]
Test-time adaptation (TTA) aims to address distributional shifts between training and testing data.
Most TTA methods assume benign test streams, while test samples could be unexpectedly diverse in the wild.
We present Screening-out Test-Time Adaptation (SoTTA), a novel TTA algorithm that is robust to noisy samples.
arXiv Detail & Related papers (2023-10-16T05:15:35Z) - Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with
Balanced Normalization [52.03927261909813]
Existing works mainly consider real-world test-time adaptation under non-i.i.d. data stream and continual domain shift.
We argue failure of state-of-the-art methods is first caused by indiscriminately adapting normalization layers to imbalanced testing data.
The final TTA model, termed as TRIBE, is built upon a tri-net architecture with balanced batchnorm layers.
arXiv Detail & Related papers (2023-09-26T14:06:26Z) - On Pitfalls of Test-Time Adaptation [82.8392232222119]
Test-Time Adaptation (TTA) has emerged as a promising approach for tackling the robustness challenge under distribution shifts.
We present TTAB, a test-time adaptation benchmark that encompasses ten state-of-the-art algorithms, a diverse array of distribution shifts, and two evaluation protocols.
arXiv Detail & Related papers (2023-06-06T09:35:29Z) - Robust Continual Test-time Adaptation: Instance-aware BN and
Prediction-balanced Memory [58.72445309519892]
We present a new test-time adaptation scheme that is robust against non-i.i.d. test data streams.
Our novelty is mainly two-fold: (a) Instance-Aware Batch Normalization (IABN) that corrects normalization for out-of-distribution samples, and (b) Prediction-balanced Reservoir Sampling (PBRS) that simulates i.i.d. data stream from non-i.i.d. stream in a class-balanced manner.
arXiv Detail & Related papers (2022-08-10T03:05:46Z) - Efficient Test-Time Model Adaptation without Forgetting [60.36499845014649]
Test-time adaptation seeks to tackle potential distribution shifts between training and testing data.
We propose an active sample selection criterion to identify reliable and non-redundant samples.
We also introduce a Fisher regularizer to constrain important model parameters from drastic changes.
arXiv Detail & Related papers (2022-04-06T06:39:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.