Test-Time Adaptation via Conjugate Pseudo-labels
- URL: http://arxiv.org/abs/2207.09640v1
- Date: Wed, 20 Jul 2022 04:02:19 GMT
- Title: Test-Time Adaptation via Conjugate Pseudo-labels
- Authors: Sachin Goyal, Mingjie Sun, Aditi Raghunathan, Zico Kolter
- Abstract summary: Test-time adaptation (TTA) refers to adapting neural networks to distribution shifts.
Prior TTA methods optimize over unsupervised objectives such as the entropy of model predictions in TENT.
We present a surprising phenomenon: if we attempt to meta-learn the best possible TTA loss over a wide class of functions, then we recover a function that is remarkably similar to (a temperature-scaled version of) the softmax-entropy employed by TENT.
- Score: 21.005027151753477
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Test-time adaptation (TTA) refers to adapting neural networks to distribution
shifts, with access to only the unlabeled test samples from the new domain at
test-time. Prior TTA methods optimize over unsupervised objectives such as the
entropy of model predictions in TENT [Wang et al., 2021], but it is unclear
what exactly makes a good TTA loss. In this paper, we start by presenting a
surprising phenomenon: if we attempt to meta-learn the best possible TTA loss
over a wide class of functions, then we recover a function that is remarkably
similar to (a temperature-scaled version of) the softmax-entropy employed by
TENT. This only holds, however, if the classifier we are adapting is trained
via cross-entropy; if trained via squared loss, a different best TTA loss
emerges. To explain this phenomenon, we analyze TTA through the lens of the
training losses's convex conjugate. We show that under natural conditions, this
(unsupervised) conjugate function can be viewed as a good local approximation
to the original supervised loss and indeed, it recovers the best losses found
by meta-learning. This leads to a generic recipe that can be used to find a
good TTA loss for any given supervised training loss function of a general
class. Empirically, our approach consistently dominates other baselines over a
wide range of benchmarks. Our approach is particularly of interest when applied
to classifiers trained with novel loss functions, e.g., the recently-proposed
PolyLoss, where it differs substantially from (and outperforms) an
entropy-based loss. Further, we show that our approach can also be interpreted
as a kind of self-training using a very specific soft label, which we refer to
as the conjugate pseudolabel. Overall, our method provides a broad framework
for better understanding and improving test-time adaptation. Code is available
at https://github.com/locuslab/tta_conjugate.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Decoupled Prototype Learning for Reliable Test-Time Adaptation [50.779896759106784]
Test-time adaptation (TTA) is a task that continually adapts a pre-trained source model to the target domain during inference.
One popular approach involves fine-tuning model with cross-entropy loss according to estimated pseudo-labels.
This study reveals that minimizing the classification error of each sample causes the cross-entropy loss's vulnerability to label noise.
We propose a novel Decoupled Prototype Learning (DPL) method that features prototype-centric loss computation.
arXiv Detail & Related papers (2024-01-15T03:33:39Z) - Towards Real-World Test-Time Adaptation: Tri-Net Self-Training with
Balanced Normalization [52.03927261909813]
Existing works mainly consider real-world test-time adaptation under non-i.i.d. data stream and continual domain shift.
We argue failure of state-of-the-art methods is first caused by indiscriminately adapting normalization layers to imbalanced testing data.
The final TTA model, termed as TRIBE, is built upon a tri-net architecture with balanced batchnorm layers.
arXiv Detail & Related papers (2023-09-26T14:06:26Z) - Cut your Losses with Squentropy [19.924900110707284]
We propose the "squentropy" loss, which is the sum of two terms: the cross-entropy loss and the average square loss over the incorrect classes.
We show that the squentropy loss outperforms both the pure cross entropy and rescaled square losses in terms of the classification accuracy.
arXiv Detail & Related papers (2023-02-08T09:21:13Z) - DELTA: degradation-free fully test-time adaptation [59.74287982885375]
We find that two unfavorable defects are concealed in the prevalent adaptation methodologies like test-time batch normalization (BN) and self-learning.
First, we reveal that the normalization statistics in test-time BN are completely affected by the currently received test samples, resulting in inaccurate estimates.
Second, we show that during test-time adaptation, the parameter update is biased towards some dominant classes.
arXiv Detail & Related papers (2023-01-30T15:54:00Z) - A Probabilistic Framework for Lifelong Test-Time Adaptation [34.07074915005366]
Test-time adaptation (TTA) is the problem of updating a pre-trained source model at inference time given test input(s) from a different target domain.
We present PETAL (Probabilistic lifElong Test-time Adaptation with seLf-training prior), which solves lifelong TTA using a probabilistic approach.
Our method achieves better results than the current state-of-the-art for online lifelong test-time adaptation across various benchmarks.
arXiv Detail & Related papers (2022-12-19T18:42:19Z) - Robust Mean Teacher for Continual and Gradual Test-Time Adaptation [5.744133015573047]
Gradual test-time adaptation (TTA) considers not only a single domain shift, but a sequence of shifts.
We propose and show that in the setting of TTA, the symmetric cross-entropy is better suited as a consistency loss for mean teachers.
We demonstrate the effectiveness of our proposed method 'robust mean teacher' (RMT) on the continual and gradual corruption benchmarks.
arXiv Detail & Related papers (2022-11-23T16:14:45Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Mixing between the Cross Entropy and the Expectation Loss Terms [89.30385901335323]
Cross entropy loss tends to focus on hard to classify samples during training.
We show that adding to the optimization goal the expectation loss helps the network to achieve better accuracy.
Our experiments show that the new training protocol improves performance across a diverse set of classification domains.
arXiv Detail & Related papers (2021-09-12T23:14:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.