How Does Mixup Help With Robustness and Generalization?
- URL: http://arxiv.org/abs/2010.04819v4
- Date: Wed, 17 Mar 2021 19:43:43 GMT
- Title: How Does Mixup Help With Robustness and Generalization?
- Authors: Linjun Zhang, Zhun Deng, Kenji Kawaguchi, Amirata Ghorbani, James Zou
- Abstract summary: We show how using Mixup in training helps model robustness and generalization.
For robustness, we show that minimizing the Mixup loss corresponds to approximately minimizing an upper bound of the adversarial loss.
For generalization, we prove that Mixup augmentation corresponds to a specific type of data-adaptive regularization which reduces overfitting.
- Score: 41.58255103170875
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixup is a popular data augmentation technique based on taking convex
combinations of pairs of examples and their labels. This simple technique has
been shown to substantially improve both the robustness and the generalization
of the trained model. However, it is not well-understood why such improvement
occurs. In this paper, we provide theoretical analysis to demonstrate how using
Mixup in training helps model robustness and generalization. For robustness, we
show that minimizing the Mixup loss corresponds to approximately minimizing an
upper bound of the adversarial loss. This explains why models obtained by Mixup
training exhibits robustness to several kinds of adversarial attacks such as
Fast Gradient Sign Method (FGSM). For generalization, we prove that Mixup
augmentation corresponds to a specific type of data-adaptive regularization
which reduces overfitting. Our analysis provides new insights and a framework
to understand Mixup.
Related papers
- Selective Mixup Helps with Distribution Shifts, But Not (Only) because
of Mixup [26.105340203096596]
We show that non-random selection of pairs affects the training distribution and improve generalization by means completely unrelated to the mixing.
We have found a new equivalence between two successful methods: selective mixup and resampling.
arXiv Detail & Related papers (2023-05-26T10:56:22Z) - The Benefits of Mixup for Feature Learning [117.93273337740442]
We first show that Mixup using different linear parameters for features and labels can still achieve similar performance to standard Mixup.
We consider a feature-noise data model and show that Mixup training can effectively learn the rare features from its mixture with the common features.
In contrast, standard training can only learn the common features but fails to learn the rare features, thus suffering from bad performance.
arXiv Detail & Related papers (2023-03-15T08:11:47Z) - Over-training with Mixup May Hurt Generalization [32.64382185990981]
We report a previously unobserved phenomenon in Mixup training.
On a number of standard datasets, the performance of Mixup-trained models starts to decay after training for a large number of epochs.
We show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthesized data.
arXiv Detail & Related papers (2023-03-02T18:37:34Z) - MixupE: Understanding and Improving Mixup from Directional Derivative
Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup.
Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z) - C-Mixup: Improving Generalization in Regression [71.10418219781575]
Mixup algorithm improves generalization by linearly interpolating a pair of examples and their corresponding labels.
We propose C-Mixup, which adjusts the sampling probability based on the similarity of the labels.
C-Mixup achieves 6.56%, 4.76%, 5.82% improvements in in-distribution generalization, task generalization, and out-of-distribution robustness, respectively.
arXiv Detail & Related papers (2022-10-11T20:39:38Z) - RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and
Out Distribution Robustness [94.69774317059122]
We show that the effectiveness of the well celebrated Mixup can be further improved if instead of using it as the sole learning objective, it is utilized as an additional regularizer to the standard cross-entropy loss.
This simple change not only provides much improved accuracy but also significantly improves the quality of the predictive uncertainty estimation of Mixup.
arXiv Detail & Related papers (2022-06-29T09:44:33Z) - Towards Compositional Adversarial Robustness: Generalizing Adversarial
Training to Composite Semantic Perturbations [70.05004034081377]
We first propose a novel method for generating composite adversarial examples.
Our method can find the optimal attack composition by utilizing component-wise projected gradient descent.
We then propose generalized adversarial training (GAT) to extend model robustness from $ell_p$-ball to composite semantic perturbations.
arXiv Detail & Related papers (2022-02-09T02:41:56Z) - Towards Understanding the Data Dependency of Mixup-style Training [14.803285140800542]
In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels.
Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk.
For a large class of linear models and linearly separable datasets, Mixup training leads to learning the same classifier as standard training.
arXiv Detail & Related papers (2021-10-14T18:13:57Z) - On Mixup Regularization [16.748910388577308]
Mixup is a data augmentation technique that creates new examples as convex combinations of training points and labels.
We show how the random perturbation of the new interpretation of Mixup induces multiple known regularization schemes.
arXiv Detail & Related papers (2020-06-10T20:11:46Z) - Adversarial Vertex Mixup: Toward Better Adversarially Robust
Generalization [28.072758856453106]
Adversarial examples cause neural networks to produce incorrect outputs with high confidence.
We show that adversarial training can overshoot the optimal point in terms of robust generalization, leading to Adversarial Feature Overfitting (AFO)
We propose Adversarial Vertex mixup (AVmixup) as a soft-labeled data augmentation approach for improving adversarially robust generalization.
arXiv Detail & Related papers (2020-03-05T08:47:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.