On Mixup Regularization
- URL: http://arxiv.org/abs/2006.06049v3
- Date: Mon, 17 Oct 2022 10:04:31 GMT
- Title: On Mixup Regularization
- Authors: Luigi Carratino, Moustapha Ciss\'e, Rodolphe Jenatton, Jean-Philippe
Vert
- Abstract summary: Mixup is a data augmentation technique that creates new examples as convex combinations of training points and labels.
We show how the random perturbation of the new interpretation of Mixup induces multiple known regularization schemes.
- Score: 16.748910388577308
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixup is a data augmentation technique that creates new examples as convex
combinations of training points and labels. This simple technique has
empirically shown to improve the accuracy of many state-of-the-art models in
different settings and applications, but the reasons behind this empirical
success remain poorly understood. In this paper we take a substantial step in
explaining the theoretical foundations of Mixup, by clarifying its
regularization effects. We show that Mixup can be interpreted as standard
empirical risk minimization estimator subject to a combination of data
transformation and random perturbation of the transformed data. We gain two
core insights from this new interpretation. First, the data transformation
suggests that, at test time, a model trained with Mixup should also be applied
to transformed data, a one-line change in code that we show empirically to
improve both accuracy and calibration of the prediction. Second, we show how
the random perturbation of the new interpretation of Mixup induces multiple
known regularization schemes, including label smoothing and reduction of the
Lipschitz constant of the estimator. These schemes interact synergistically
with each other, resulting in a self calibrated and effective regularization
effect that prevents overfitting and overconfident predictions. We corroborate
our theoretical analysis with experiments that support our conclusions.
Related papers
- Collaborative Heterogeneous Causal Inference Beyond Meta-analysis [68.4474531911361]
We propose a collaborative inverse propensity score estimator for causal inference with heterogeneous data.
Our method shows significant improvements over the methods based on meta-analysis when heterogeneity increases.
arXiv Detail & Related papers (2024-04-24T09:04:36Z) - Towards Theoretical Understandings of Self-Consuming Generative Models [56.84592466204185]
This paper tackles the emerging challenge of training generative models within a self-consuming loop.
We construct a theoretical framework to rigorously evaluate how this training procedure impacts the data distributions learned by future models.
We present results for kernel density estimation, delivering nuanced insights such as the impact of mixed data training on error propagation.
arXiv Detail & Related papers (2024-02-19T02:08:09Z) - Fairness under Covariate Shift: Improving Fairness-Accuracy tradeoff
with few Unlabeled Test Samples [21.144077993862652]
We operate in the unsupervised regime where only a small set of unlabeled test samples along with a labeled training set is available.
We experimentally verify that optimizing with our loss formulation outperforms a number of state-of-the-art baselines.
We show that our proposed method significantly outperforms them.
arXiv Detail & Related papers (2023-10-11T14:39:51Z) - Semantic Equivariant Mixup [54.734054770032934]
Mixup is a well-established data augmentation technique, which can extend the training distribution and regularize the neural networks.
Previous mixup variants tend to over-focus on the label-related information.
We propose a semantic equivariant mixup (sem) to preserve richer semantic information in the input.
arXiv Detail & Related papers (2023-08-12T03:05:53Z) - Over-training with Mixup May Hurt Generalization [32.64382185990981]
We report a previously unobserved phenomenon in Mixup training.
On a number of standard datasets, the performance of Mixup-trained models starts to decay after training for a large number of epochs.
We show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthesized data.
arXiv Detail & Related papers (2023-03-02T18:37:34Z) - RegMixup: Mixup as a Regularizer Can Surprisingly Improve Accuracy and
Out Distribution Robustness [94.69774317059122]
We show that the effectiveness of the well celebrated Mixup can be further improved if instead of using it as the sole learning objective, it is utilized as an additional regularizer to the standard cross-entropy loss.
This simple change not only provides much improved accuracy but also significantly improves the quality of the predictive uncertainty estimation of Mixup.
arXiv Detail & Related papers (2022-06-29T09:44:33Z) - A similarity-based Bayesian mixture-of-experts model [0.5156484100374058]
We present a new non-parametric mixture-of-experts model for multivariate regression problems.
Using a conditionally specified model, predictions for out-of-sample inputs are based on similarities to each observed data point.
Posterior inference is performed on the parameters of the mixture as well as the distance metric.
arXiv Detail & Related papers (2020-12-03T18:08:30Z) - How Does Mixup Help With Robustness and Generalization? [41.58255103170875]
We show how using Mixup in training helps model robustness and generalization.
For robustness, we show that minimizing the Mixup loss corresponds to approximately minimizing an upper bound of the adversarial loss.
For generalization, we prove that Mixup augmentation corresponds to a specific type of data-adaptive regularization which reduces overfitting.
arXiv Detail & Related papers (2020-10-09T21:38:14Z) - Good Classifiers are Abundant in the Interpolating Regime [64.72044662855612]
We develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers.
We find that test errors tend to concentrate around a small typical value $varepsilon*$, which deviates substantially from the test error of worst-case interpolating model.
Our results show that the usual style of analysis in statistical learning theory may not be fine-grained enough to capture the good generalization performance observed in practice.
arXiv Detail & Related papers (2020-06-22T21:12:31Z) - Evaluating Prediction-Time Batch Normalization for Robustness under
Covariate Shift [81.74795324629712]
We call prediction-time batch normalization, which significantly improves model accuracy and calibration under covariate shift.
We show that prediction-time batch normalization provides complementary benefits to existing state-of-the-art approaches for improving robustness.
The method has mixed results when used alongside pre-training, and does not seem to perform as well under more natural types of dataset shift.
arXiv Detail & Related papers (2020-06-19T05:08:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.