Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization
- URL: http://arxiv.org/abs/2404.17768v2
- Date: Sat, 02 Nov 2024 00:51:16 GMT
- Title: Changing the Training Data Distribution to Reduce Simplicity Bias Improves In-distribution Generalization
- Authors: Dang Nguyen, Paymon Haddad, Eric Gan, Baharan Mirzasoleiman,
- Abstract summary: We show that sharpness-aware minimization (SAM) learns different features more uniformly, particularly in early epochs.
We propose a method that (i) clusters examples based on the network output early in training, (ii) identifies a cluster of examples with similar network output, and (iii) upsamples the rest of examples only once to alleviate the simplicity bias.
- Score: 12.472871440252105
- License:
- Abstract: Can we modify the training data distribution to encourage the underlying optimization method toward finding solutions with superior generalization performance on in-distribution data? In this work, we approach this question for the first time by comparing the inductive bias of gradient descent (GD) with that of sharpness-aware minimization (SAM). By studying a two-layer CNN, we rigorously prove that SAM learns different features more uniformly, particularly in early epochs. That is, SAM is less susceptible to simplicity bias compared to GD. We also show that examples containing features that are learned early are separable from the rest based on the model's output. Based on this observation, we propose a method that (i) clusters examples based on the network output early in training, (ii) identifies a cluster of examples with similar network output, and (iii) upsamples the rest of examples only once to alleviate the simplicity bias. We show empirically that USEFUL effectively improves the generalization performance on the original data distribution when training with various gradient methods, including (S)GD and SAM. Notably, we demonstrate that our method can be combined with SAM variants and existing data augmentation strategies to achieve, to the best of our knowledge, state-of-the-art performance for training ResNet18 on CIFAR10, STL10, CINIC10, Tiny-ImageNet; ResNet34 on CIFAR100; and VGG19 and DenseNet121 on CIFAR10.
Related papers
- Rethinking Classifier Re-Training in Long-Tailed Recognition: A Simple
Logits Retargeting Approach [102.0769560460338]
We develop a simple logits approach (LORT) without the requirement of prior knowledge of the number of samples per class.
Our method achieves state-of-the-art performance on various imbalanced datasets, including CIFAR100-LT, ImageNet-LT, and iNaturalist 2018.
arXiv Detail & Related papers (2024-03-01T03:27:08Z) - Take the Bull by the Horns: Hard Sample-Reweighted Continual Training
Improves LLM Generalization [165.98557106089777]
A key challenge is to enhance the capabilities of large language models (LLMs) amid a looming shortage of high-quality training data.
Our study starts from an empirical strategy for the light continual training of LLMs using their original pre-training data sets.
We then formalize this strategy into a principled framework of Instance-Reweighted Distributionally Robust Optimization.
arXiv Detail & Related papers (2024-02-22T04:10:57Z) - Simple and effective data augmentation for compositional generalization [64.00420578048855]
We show that data augmentation methods that sample MRs and backtranslate them can be effective for compositional generalization.
Remarkably, sampling from a uniform distribution performs almost as well as sampling from the test distribution.
arXiv Detail & Related papers (2024-01-18T09:13:59Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - DBN-Mix: Training Dual Branch Network Using Bilateral Mixup Augmentation
for Long-Tailed Visual Recognition [7.94190631530826]
We develop a simple yet effective method to improve the performance of DBN without cumulative learning.
We present class-conditional temperature scaling that mitigates bias toward the majority class for the proposed DBN architecture.
arXiv Detail & Related papers (2022-07-05T17:01:27Z) - Open-Sampling: Exploring Out-of-Distribution data for Re-balancing
Long-tailed datasets [24.551465814633325]
Deep neural networks usually perform poorly when the training dataset suffers from extreme class imbalance.
Recent studies found that directly training with out-of-distribution data in a semi-supervised manner would harm the generalization performance.
We propose a novel method called Open-sampling, which utilizes open-set noisy labels to re-balance the class priors of the training dataset.
arXiv Detail & Related papers (2022-06-17T14:29:52Z) - POODLE: Improving Few-shot Learning via Penalizing Out-of-Distribution
Samples [19.311470287767385]
We propose to use out-of-distribution samples, i.e., unlabeled samples coming from outside the target classes, to improve few-shot learning.
Our approach is simple to implement, agnostic to feature extractors, lightweight without any additional cost for pre-training, and applicable to both inductive and transductive settings.
arXiv Detail & Related papers (2022-06-08T18:59:21Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z) - Selective Output Smoothing Regularization: Regularize Neural Networks by
Softening Output Distributions [5.725228891050467]
We propose Selective Output Smoothing Regularization, a novel regularization method for training the Convolutional Neural Networks (CNNs)
Inspired by the diverse effects on training from different samples, Selective Output Smoothing Regularization improves the performance by encouraging the model to produce equal logits on incorrect classes.
This plug-and-play regularization method can be conveniently incorporated into almost any CNN-based project without extra hassle.
arXiv Detail & Related papers (2021-03-29T07:21:06Z) - Mind the Trade-off: Debiasing NLU Models without Degrading the
In-distribution Performance [70.31427277842239]
We introduce a novel debiasing method called confidence regularization.
It discourages models from exploiting biases while enabling them to receive enough incentive to learn from all the training examples.
We evaluate our method on three NLU tasks and show that, in contrast to its predecessors, it improves the performance on out-of-distribution datasets.
arXiv Detail & Related papers (2020-05-01T11:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.