Agree to Disagree: Diversity through Disagreement for Better
Transferability
- URL: http://arxiv.org/abs/2202.04414v1
- Date: Wed, 9 Feb 2022 12:03:02 GMT
- Title: Agree to Disagree: Diversity through Disagreement for Better
Transferability
- Authors: Matteo Pagliardini, Martin Jaggi, Fran\c{c}ois Fleuret, Sai Praneeth
Karimireddy
- Abstract summary: We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
- Score: 54.308327969778155
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gradient-based learning algorithms have an implicit simplicity bias which in
effect can limit the diversity of predictors being sampled by the learning
procedure. This behavior can hinder the transferability of trained models by
(i) favoring the learning of simpler but spurious features -- present in the
training data but absent from the test data -- and (ii) by only leveraging a
small subset of predictive features. Such an effect is especially magnified
when the test distribution does not exactly match the train distribution --
referred to as the Out of Distribution (OOD) generalization problem. However,
given only the training data, it is not always possible to apriori assess if a
given feature is spurious or transferable. Instead, we advocate for learning an
ensemble of models which capture a diverse set of predictive features. Towards
this, we propose a new algorithm D-BAT (Diversity-By-disAgreement Training),
which enforces agreement among the models on the training data, but
disagreement on the OOD data. We show how D-BAT naturally emerges from the
notion of generalized discrepancy, as well as demonstrate in multiple
experiments how the proposed method can mitigate shortcut-learning, enhance
uncertainty and OOD detection, as well as improve transferability.
Related papers
- Unsupervised Transfer Learning via Adversarial Contrastive Training [3.227277661633986]
We propose a novel unsupervised transfer learning approach using adversarial contrastive training (ACT)
Our experimental results demonstrate outstanding classification accuracy with both fine-tuned linear probe and K-NN protocol across various datasets.
arXiv Detail & Related papers (2024-08-16T05:11:52Z) - Probabilistic Contrastive Learning for Long-Tailed Visual Recognition [78.70453964041718]
Longtailed distributions frequently emerge in real-world data, where a large number of minority categories contain a limited number of samples.
Recent investigations have revealed that supervised contrastive learning exhibits promising potential in alleviating the data imbalance.
We propose a novel probabilistic contrastive (ProCo) learning algorithm that estimates the data distribution of the samples from each class in the feature space.
arXiv Detail & Related papers (2024-03-11T13:44:49Z) - Ask Your Distribution Shift if Pre-Training is Right for You [74.18516460467019]
In practice, fine-tuning a pre-trained model improves robustness significantly in some cases but not at all in others.
We focus on two possible failure modes of models under distribution shift: poor extrapolation and biases in the training data.
Our study suggests that, as a rule of thumb, pre-training can help mitigate poor extrapolation but not dataset biases.
arXiv Detail & Related papers (2024-02-29T23:46:28Z) - CAFA: Class-Aware Feature Alignment for Test-Time Adaptation [50.26963784271912]
Test-time adaptation (TTA) aims to address this challenge by adapting a model to unlabeled data at test time.
We propose a simple yet effective feature alignment loss, termed as Class-Aware Feature Alignment (CAFA), which simultaneously encourages a model to learn target representations in a class-discriminative manner.
arXiv Detail & Related papers (2022-06-01T03:02:07Z) - Uncertainty Estimation for Language Reward Models [5.33024001730262]
Language models can learn a range of capabilities from unsupervised training on text corpora.
It is often easier for humans to choose between options than to provide labeled data, and prior work has achieved state-of-the-art performance by training a reward model from such preference comparisons.
We seek to address these problems via uncertainty estimation, which can improve sample efficiency and robustness using active learning and risk-averse reinforcement learning.
arXiv Detail & Related papers (2022-03-14T20:13:21Z) - Evading the Simplicity Bias: Training a Diverse Set of Models Discovers
Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features.
This simplicity bias can explain their lack of robustness out of distribution (OOD)
We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z) - Mind the Trade-off: Debiasing NLU Models without Degrading the
In-distribution Performance [70.31427277842239]
We introduce a novel debiasing method called confidence regularization.
It discourages models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples.
We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets.
arXiv Detail & Related papers (2020-05-01T11:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.