Learning Less Generalizable Patterns with an Asymmetrically Trained
Double Classifier for Better Test-Time Adaptation
- URL: http://arxiv.org/abs/2210.09834v1
- Date: Mon, 17 Oct 2022 08:05:38 GMT
- Title: Learning Less Generalizable Patterns with an Asymmetrically Trained
Double Classifier for Better Test-Time Adaptation
- Authors: Thomas Duboudin (imagine), Emmanuel Dellandr\'ea, Corentin Abgrall,
Gilles H\'enaff, Liming Chen
- Abstract summary: We propose a novel approach using a pair of classifiers and a shortcut patterns avoidance loss.
Our method improves upon the state-of-the-art results on both benchmarks and benefits the most to test-time batch normalization.
- Score: 4.893694715581673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks often fail to generalize outside of their training
distribution, in particular when only a single data domain is available during
training. While test-time adaptation has yielded encouraging results in this
setting, we argue that, to reach further improvements, these approaches should
be combined with training procedure modifications aiming to learn a more
diverse set of patterns. Indeed, test-time adaptation methods usually have to
rely on a limited representation because of the shortcut learning phenomenon:
only a subset of the available predictive patterns is learned with standard
training. In this paper, we first show that the combined use of existing
training-time strategies, and test-time batch normalization, a simple
adaptation method, does not always improve upon the test-time adaptation alone
on the PACS benchmark. Furthermore, experiments on Office-Home show that very
few training-time methods improve upon standard training, with or without
test-time batch normalization. We therefore propose a novel approach using a
pair of classifiers and a shortcut patterns avoidance loss that mitigates the
shortcut learning behavior by reducing the generalization ability of the
secondary classifier, using the additional shortcut patterns avoidance loss
that encourages the learning of samples specific patterns. The primary
classifier is trained normally, resulting in the learning of both the natural
and the more complex, less generalizable, features. Our experiments show that
our method improves upon the state-of-the-art results on both benchmarks and
benefits the most to test-time batch normalization.
Related papers
- Stability and Generalization in Free Adversarial Training [9.831489366502302]
We study the generalization performance of adversarial training methods using the algorithmic stability framework.
Our proven generalization bounds indicate that the free adversarial training method could enjoy a lower generalization gap between training and test samples.
arXiv Detail & Related papers (2024-04-13T12:07:20Z) - Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - United We Stand: Using Epoch-wise Agreement of Ensembles to Combat
Overfit [7.627299398469962]
We introduce a novel ensemble classifier for deep networks that effectively overcomes overfitting.
Our method allows for the incorporation of useful knowledge obtained during the overfitting phase without deterioration of the general performance.
Our method is easy to implement and can be integrated with any training scheme and architecture.
arXiv Detail & Related papers (2023-10-17T08:51:44Z) - Understanding prompt engineering may not require rethinking
generalization [56.38207873589642]
We show that the discrete nature of prompts, combined with a PAC-Bayes prior given by a language model, results in generalization bounds that are remarkably tight by the standards of the literature.
This work provides a possible justification for the widespread practice of prompt engineering.
arXiv Detail & Related papers (2023-10-06T00:52:48Z) - Test-Time Prompt Tuning for Zero-Shot Generalization in Vision-Language
Models [107.05966685291067]
We propose test-time prompt tuning (TPT) to learn adaptive prompts on the fly with a single test sample.
TPT improves the zero-shot top-1 accuracy of CLIP by 3.6% on average.
In evaluating cross-dataset generalization with unseen categories, TPT performs on par with the state-of-the-art approaches that use additional training data.
arXiv Detail & Related papers (2022-09-15T17:55:11Z) - Semantic Self-adaptation: Enhancing Generalization with a Single Sample [45.111358665370524]
We propose a self-adaptive approach for semantic segmentation.
It fine-tunes the parameters of convolutional layers to the input image using consistency regularization.
Our empirical study suggests that self-adaptation may complement the established practice of model regularization at training time.
arXiv Detail & Related papers (2022-08-10T12:29:01Z) - Distributionally Robust Models with Parametric Likelihood Ratios [123.05074253513935]
Three simple ideas allow us to train models with DRO using a broader class of parametric likelihood ratios.
We find that models trained with the resulting parametric adversaries are consistently more robust to subpopulation shifts when compared to other DRO approaches.
arXiv Detail & Related papers (2022-04-13T12:43:12Z) - Distribution Mismatch Correction for Improved Robustness in Deep Neural
Networks [86.42889611784855]
normalization methods increase the vulnerability with respect to noise and input corruptions.
We propose an unsupervised non-parametric distribution correction method that adapts the activation distribution of each layer.
In our experiments, we empirically show that the proposed method effectively reduces the impact of intense image corruptions.
arXiv Detail & Related papers (2021-10-05T11:36:25Z) - Unified Regularity Measures for Sample-wise Learning and Generalization [18.10522585996242]
We propose a pair of sample regularity measures for both processes with a formulation-consistent representation.
Experiments validated the effectiveness and robustness of the proposed approaches for mini-batch SGD optimization.
arXiv Detail & Related papers (2021-08-09T10:11:14Z) - Squared $\ell_2$ Norm as Consistency Loss for Leveraging Augmented Data
to Learn Robust and Invariant Representations [76.85274970052762]
Regularizing distance between embeddings/representations of original samples and augmented counterparts is a popular technique for improving robustness of neural networks.
In this paper, we explore these various regularization choices, seeking to provide a general understanding of how we should regularize the embeddings.
We show that the generic approach we identified (squared $ell$ regularized augmentation) outperforms several recent methods, which are each specially designed for one task.
arXiv Detail & Related papers (2020-11-25T22:40:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.