Over-training with Mixup May Hurt Generalization
- URL: http://arxiv.org/abs/2303.01475v1
- Date: Thu, 2 Mar 2023 18:37:34 GMT
- Title: Over-training with Mixup May Hurt Generalization
- Authors: Zixuan Liu, Ziqiao Wang, Hongyu Guo, Yongyi Mao
- Abstract summary: We report a previously unobserved phenomenon in Mixup training.
On a number of standard datasets, the performance of Mixup-trained models starts to decay after training for a large number of epochs.
We show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthesized data.
- Score: 32.64382185990981
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixup, which creates synthetic training instances by linearly interpolating
random sample pairs, is a simple and yet effective regularization technique to
boost the performance of deep models trained with SGD. In this work, we report
a previously unobserved phenomenon in Mixup training: on a number of standard
datasets, the performance of Mixup-trained models starts to decay after
training for a large number of epochs, giving rise to a U-shaped generalization
curve. This behavior is further aggravated when the size of original dataset is
reduced. To help understand such a behavior of Mixup, we show theoretically
that Mixup training may introduce undesired data-dependent label noises to the
synthesized data. Via analyzing a least-square regression problem with a random
feature model, we explain why noisy labels may cause the U-shaped curve to
occur: Mixup improves generalization through fitting the clean patterns at the
early training stage, but as training progresses, Mixup becomes over-fitting to
the noise in the synthetic data. Extensive experiments are performed on a
variety of benchmark datasets, validating this explanation.
Related papers
- RC-Mixup: A Data Augmentation Strategy against Noisy Data for Regression Tasks [27.247270530020664]
We study the problem of robust data augmentation for regression tasks in the presence of noisy data.
C-Mixup is more selective in which samples to mix based on their label distances for better regression performance.
We propose RC-Mixup, which tightly integrates C-Mixup with multi-round robust training methods for a synergistic effect.
arXiv Detail & Related papers (2024-05-28T08:02:42Z) - Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - One More Step: A Versatile Plug-and-Play Module for Rectifying Diffusion
Schedule Flaws and Enhancing Low-Frequency Controls [77.42510898755037]
One More Step (OMS) is a compact network that incorporates an additional simple yet effective step during inference.
OMS elevates image fidelity and harmonizes the dichotomy between training and inference, while preserving original model parameters.
Once trained, various pre-trained diffusion models with the same latent domain can share the same OMS module.
arXiv Detail & Related papers (2023-11-27T12:02:42Z) - Mixup Your Own Pairs [22.882694278940598]
We argue that the potential of contrastive learning for regression has been overshadowed due to the neglect of two crucial aspects: ordinality-awareness and hardness.
Specifically, we propose Supervised Contrastive Learning for Regression with Mixup (SupReMix)
It takes anchor-inclusive mixtures (mixup of the anchor and a distinct negative sample) as hard negative pairs and anchor-exclusive mixtures (mixup of two distinct negative samples) as hard positive pairs at the embedding level.
arXiv Detail & Related papers (2023-09-28T17:38:59Z) - Solving Inverse Problems with Score-Based Generative Priors learned from
Noisy Data [1.7969777786551424]
SURE-Score is an approach for learning score-based generative models using training samples corrupted by additive Gaussian noise.
We demonstrate the generality of SURE-Score by learning priors and applying posterior sampling to ill-posed inverse problems in two practical applications.
arXiv Detail & Related papers (2023-05-02T02:51:01Z) - C-Mixup: Improving Generalization in Regression [71.10418219781575]
Mixup algorithm improves generalization by linearly interpolating a pair of examples and their corresponding labels.
We propose C-Mixup, which adjusts the sampling probability based on the similarity of the labels.
C-Mixup achieves 6.56%, 4.76%, 5.82% improvements in in-distribution generalization, task generalization, and out-of-distribution robustness, respectively.
arXiv Detail & Related papers (2022-10-11T20:39:38Z) - Learning from aggregated data with a maximum entropy model [73.63512438583375]
We show how a new model, similar to a logistic regression, may be learned from aggregated data only by approximating the unobserved feature distribution with a maximum entropy hypothesis.
We present empirical evidence on several public datasets that the model learned this way can achieve performances comparable to those of a logistic model trained with the full unaggregated data.
arXiv Detail & Related papers (2022-10-05T09:17:27Z) - DoubleMix: Simple Interpolation-Based Data Augmentation for Text
Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix.
DoubleMix first generates several perturbed samples for each training data.
It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z) - Towards Understanding the Data Dependency of Mixup-style Training [14.803285140800542]
In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels.
Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk.
For a large class of linear models and linearly separable datasets, Mixup training leads to learning the same classifier as standard training.
arXiv Detail & Related papers (2021-10-14T18:13:57Z) - MixRL: Data Mixing Augmentation for Regression using Reinforcement
Learning [2.1345682889327837]
Existing techniques for data augmentation largely focus on classification tasks and do not readily apply to regression tasks.
We show that mixing examples that either have a large data or label distance may have an increasingly-negative effect on model performance.
We propose MixRL, a data augmentation meta learning framework for regression that learns for each example how many nearest neighbors it should be mixed with for the best model performance.
arXiv Detail & Related papers (2021-06-07T07:01:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.