An Empirical Study on Distribution Shift Robustness From the Perspective
of Pre-Training and Data Augmentation
- URL: http://arxiv.org/abs/2205.12753v1
- Date: Wed, 25 May 2022 13:04:53 GMT
- Title: An Empirical Study on Distribution Shift Robustness From the Perspective
of Pre-Training and Data Augmentation
- Authors: Ziquan Liu, Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong Jin, Xiangyang
Ji, Antoni B. Chan
- Abstract summary: This paper studies the distribution shift problem from the perspective of pre-training and data augmentation.
We provide the first comprehensive empirical study focusing on pre-training and data augmentation.
- Score: 91.62129090006745
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The performance of machine learning models under distribution shift has been
the focus of the community in recent years. Most of current methods have been
proposed to improve the robustness to distribution shift from the algorithmic
perspective, i.e., designing better training algorithms to help the
generalization in shifted test distributions. This paper studies the
distribution shift problem from the perspective of pre-training and data
augmentation, two important factors in the practice of deep learning that have
not been systematically investigated by existing work. By evaluating seven
pre-trained models, including ResNets and ViT's with self-supervision and
supervision mode, on five important distribution-shift datasets, from WILDS and
DomainBed benchmarks, with five different learning algorithms, we provide the
first comprehensive empirical study focusing on pre-training and data
augmentation. With our empirical result obtained from 1,330 models, we provide
the following main observations: 1) ERM combined with data augmentation can
achieve state-of-the-art performance if we choose a proper pre-trained model
respecting the data property; 2) specialized algorithms further improve the
robustness on top of ERM when handling a specific type of distribution shift,
e.g., GroupDRO for spurious correlation and CORAL for large-scale
out-of-distribution data; 3) Comparing different pre-training modes,
architectures and data sizes, we provide novel observations about pre-training
on distribution shift, which sheds light on designing or selecting pre-training
strategy for different kinds of distribution shifts. In summary, our empirical
study provides a comprehensive baseline for a wide range of pre-training models
fine-tuned with data augmentation, which potentially inspires research
exploiting the power of pre-training and data augmentation in the future of
distribution shift study.
Related papers
- Embedding And Clustering Your Data Can Improve Contrastive Pretraining [0.0]
We explore extending training data stratification beyond source granularity by leveraging a pretrained text embedding model and the classic k-means clustering algorithm.
Experimentally, we observe a notable increase in NDCG@10 when pretraining a BERT-based text embedding model on query-passage pairs from the MSMARCO passage retrieval dataset.
arXiv Detail & Related papers (2024-07-26T17:36:40Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration [11.102950630209879]
In out-of-distribution (OOD) generalization tasks, fine-tuning pre-trained models has become a prevalent strategy.
We examined how pre-trained model size, pre-training dataset size, and training strategies impact generalization and uncertainty calibration.
arXiv Detail & Related papers (2023-07-17T01:27:10Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - On the Trade-off of Intra-/Inter-class Diversity for Supervised
Pre-training [72.8087629914444]
We study the impact of the trade-off between the intra-class diversity (the number of samples per class) and the inter-class diversity (the number of classes) of a supervised pre-training dataset.
With the size of the pre-training dataset fixed, the best downstream performance comes with a balance on the intra-/inter-class diversity.
arXiv Detail & Related papers (2023-05-20T16:23:50Z) - Better Modelling Out-of-Distribution Regression on Distributed Acoustic
Sensor Data Using Anchored Hidden State Mixup [0.7455546102930911]
Generalizing the application of machine learning models to situations where the statistical distribution of training and test data are different has been a complex problem.
We introduce an anchored-based Out of Distribution (OOD) Regression Mixup algorithm, leveraging manifold hidden state mixup and observation similarities to form a novel regularization penalty.
We demonstrate with an extensive evaluation the generalization performance of the proposed method against existing approaches, then show that our method achieves state-of-the-art performance.
arXiv Detail & Related papers (2022-02-23T03:12:21Z) - Few-Shot Named Entity Recognition: A Comprehensive Study [92.40991050806544]
We investigate three schemes to improve the model generalization ability for few-shot settings.
We perform empirical comparisons on 10 public NER datasets with various proportions of labeled data.
We create new state-of-the-art results on both few-shot and training-free settings.
arXiv Detail & Related papers (2020-12-29T23:43:16Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.