Unraveling the Key Components of OOD Generalization via Diversification
- URL: http://arxiv.org/abs/2312.16313v3
- Date: Sat, 20 Apr 2024 15:09:57 GMT
- Title: Unraveling the Key Components of OOD Generalization via Diversification
- Authors: Harold Benoit, Liangze Jiang, Andrei Atanov, Oğuzhan Fatih Kar, Mattia Rigotti, Amir Zamir,
- Abstract summary: Supervised learning datasets may contain multiple cues that explain the training set equally well.
Many of them can be spurious, i.e., lose their predictive power under a distribution shift.
Recently developed "diversification" methods approach this problem by finding multiple diverse hypotheses.
- Score: 20.064947636966078
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Supervised learning datasets may contain multiple cues that explain the training set equally well, i.e., learning any of them would lead to the correct predictions on the training data. However, many of them can be spurious, i.e., lose their predictive power under a distribution shift and consequently fail to generalize to out-of-distribution (OOD) data. Recently developed "diversification" methods (Lee et al., 2023; Pagliardini et al., 2023) approach this problem by finding multiple diverse hypotheses that rely on different features. This paper aims to study this class of methods and identify the key components contributing to their OOD generalization abilities. We show that (1) diversification methods are highly sensitive to the distribution of the unlabeled data used for diversification and can underperform significantly when away from a method-specific sweet spot. (2) Diversification alone is insufficient for OOD generalization. The choice of the used learning algorithm, e.g., the model's architecture and pretraining, is crucial. In standard experiments (classification on Waterbirds and Office-Home datasets), using the second-best choice leads to an up to 20\% absolute drop in accuracy. (3) The optimal choice of learning algorithm depends on the unlabeled data and vice versa i.e. they are co-dependent. (4) Finally, we show that, in practice, the above pitfalls cannot be alleviated by increasing the number of diverse hypotheses, the major feature of diversification methods. These findings provide a clearer understanding of the critical design factors influencing the OOD generalization abilities of diversification methods. They can guide practitioners in how to use the existing methods best and guide researchers in developing new, better ones.
Related papers
- Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data [102.16105233826917]
Learning from preference labels plays a crucial role in fine-tuning large language models.
There are several distinct approaches for preference fine-tuning, including supervised learning, on-policy reinforcement learning (RL), and contrastive learning.
arXiv Detail & Related papers (2024-04-22T17:20:18Z) - Crowd-Certain: Label Aggregation in Crowdsourced and Ensemble Learning
Classification [0.0]
We introduce Crowd-Certain, a novel approach for label aggregation in crowdsourced and ensemble learning classification tasks.
The proposed method uses the consistency of the annotators versus a trained classifier to determine a reliability score for each annotator.
We extensively evaluated our approach against ten existing techniques across ten different datasets, each labeled by varying numbers of annotators.
arXiv Detail & Related papers (2023-10-25T01:58:37Z) - Generalizable Low-Resource Activity Recognition with Diverse and
Discriminative Representation Learning [24.36351102003414]
Human activity recognition (HAR) is a time series classification task that focuses on identifying the motion patterns from human sensor readings.
We propose a novel approach called Diverse and Discriminative representation Learning (DDLearn) for generalizable lowresource HAR.
Our method significantly outperforms state-of-art methods by an average accuracy improvement of 9.5%.
arXiv Detail & Related papers (2023-05-25T08:24:22Z) - ASPEST: Bridging the Gap Between Active Learning and Selective
Prediction [56.001808843574395]
Selective prediction aims to learn a reliable model that abstains from making predictions when uncertain.
Active learning aims to lower the overall labeling effort, and hence human dependence, by querying the most informative examples.
In this work, we introduce a new learning paradigm, active selective prediction, which aims to query more informative samples from the shifted target domain.
arXiv Detail & Related papers (2023-04-07T23:51:07Z) - HyperInvariances: Amortizing Invariance Learning [10.189246340672245]
Invariance learning is expensive and data intensive for popular neural architectures.
We introduce the notion of amortizing invariance learning.
This framework can identify appropriate invariances in different downstream tasks and lead to comparable or better test performance.
arXiv Detail & Related papers (2022-07-17T21:40:37Z) - Do Deep Neural Networks Always Perform Better When Eating More Data? [82.6459747000664]
We design experiments from Identically Independent Distribution(IID) and Out of Distribution(OOD)
Under IID condition, the amount of information determines the effectivity of each sample, the contribution of samples and difference between classes determine the amount of class information.
Under OOD condition, the cross-domain degree of samples determine the contributions, and the bias-fitting caused by irrelevant elements is a significant factor of cross-domain.
arXiv Detail & Related papers (2022-05-30T15:40:33Z) - Agree to Disagree: Diversity through Disagreement for Better
Transferability [54.308327969778155]
We propose D-BAT (Diversity-By-disAgreement Training), which enforces agreement among the models on the training data.
We show how D-BAT naturally emerges from the notion of generalized discrepancy.
arXiv Detail & Related papers (2022-02-09T12:03:02Z) - Efficient Diversity-Driven Ensemble for Deep Neural Networks [28.070540722925152]
We propose Efficient Diversity-Driven Ensemble (EDDE) to address both the diversity and the efficiency of an ensemble.
Compared with other well-known ensemble methods, EDDE can get highest ensemble accuracy with the lowest training cost.
We evaluate EDDE on Computer Vision (CV) and Natural Language Processing (NLP) tasks.
arXiv Detail & Related papers (2021-12-26T04:28:47Z) - Evading the Simplicity Bias: Training a Diverse Set of Models Discovers
Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features.
This simplicity bias can explain their lack of robustness out of distribution (OOD)
We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z) - Towards Improved and Interpretable Deep Metric Learning via Attentive
Grouping [103.71992720794421]
Grouping has been commonly used in deep metric learning for computing diverse features.
We propose an improved and interpretable grouping method to be integrated flexibly with any metric learning framework.
arXiv Detail & Related papers (2020-11-17T19:08:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.