k-Mixup Regularization for Deep Learning via Optimal Transport
- URL: http://arxiv.org/abs/2106.02933v2
- Date: Sat, 7 Oct 2023 05:03:55 GMT
- Title: k-Mixup Regularization for Deep Learning via Optimal Transport
- Authors: Kristjan Greenewald, Anming Gu, Mikhail Yurochkin, Justin Solomon,
Edward Chien
- Abstract summary: Mixup is a popular regularization technique for training deep neural networks.
We extend mixup in a simple, broadly applicable way to emph$k$-mixup, which perturbs $k$-batches of training points in the direction of other $k$-batches.
We show that training with $k$-mixup further improves generalization and robustness across several network architectures.
- Score: 32.951696405505686
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Mixup is a popular regularization technique for training deep neural networks
that improves generalization and increases robustness to certain distribution
shifts. It perturbs input training data in the direction of other
randomly-chosen instances in the training set. To better leverage the structure
of the data, we extend mixup in a simple, broadly applicable way to
\emph{$k$-mixup}, which perturbs $k$-batches of training points in the
direction of other $k$-batches. The perturbation is done with displacement
interpolation, i.e. interpolation under the Wasserstein metric. We demonstrate
theoretically and in simulations that $k$-mixup preserves cluster and manifold
structures, and we extend theory studying the efficacy of standard mixup to the
$k$-mixup case. Our empirical results show that training with $k$-mixup further
improves generalization and robustness across several network architectures and
benchmark datasets of differing modalities. For the wide variety of real
datasets considered, the performance gains of $k$-mixup over standard mixup are
similar to or larger than the gains of mixup itself over standard ERM after
hyperparameter optimization. In several instances, in fact, $k$-mixup achieves
gains in settings where standard mixup has negligible to zero improvement over
ERM.
Related papers
- PowMix: A Versatile Regularizer for Multimodal Sentiment Analysis [71.8946280170493]
This paper introduces PowMix, a versatile embedding space regularizer that builds upon the strengths of unimodal mixing-based regularization approaches.
PowMix is integrated before the fusion stage of multimodal architectures and facilitates intra-modal mixing, such as mixing text with text, to act as a regularizer.
arXiv Detail & Related papers (2023-12-19T17:01:58Z) - The Benefits of Mixup for Feature Learning [117.93273337740442]
We first show that Mixup using different linear parameters for features and labels can still achieve similar performance to standard Mixup.
We consider a feature-noise data model and show that Mixup training can effectively learn the rare features from its mixture with the common features.
In contrast, standard training can only learn the common features but fails to learn the rare features, thus suffering from bad performance.
arXiv Detail & Related papers (2023-03-15T08:11:47Z) - Over-training with Mixup May Hurt Generalization [32.64382185990981]
We report a previously unobserved phenomenon in Mixup training.
On a number of standard datasets, the performance of Mixup-trained models starts to decay after training for a large number of epochs.
We show theoretically that Mixup training may introduce undesired data-dependent label noises to the synthesized data.
arXiv Detail & Related papers (2023-03-02T18:37:34Z) - MixupE: Understanding and Improving Mixup from Directional Derivative
Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup.
Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z) - C-Mixup: Improving Generalization in Regression [71.10418219781575]
Mixup algorithm improves generalization by linearly interpolating a pair of examples and their corresponding labels.
We propose C-Mixup, which adjusts the sampling probability based on the similarity of the labels.
C-Mixup achieves 6.56%, 4.76%, 5.82% improvements in in-distribution generalization, task generalization, and out-of-distribution robustness, respectively.
arXiv Detail & Related papers (2022-10-11T20:39:38Z) - Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data.
In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM)
DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z) - Epsilon Consistent Mixup: An Adaptive Consistency-Interpolation Tradeoff [19.03167022268852]
$epsilon$mu is a data-based structural regularization technique that combines Mixup's linear with consistency regularization in the Mixup direction.
It is shown to improve semi-supervised classification accuracy on the SVHN and CIFAR10 benchmark datasets.
In particular, $epsilon$mu is found to produce more accurate synthetic labels and more confident predictions than Mixup.
arXiv Detail & Related papers (2021-04-19T17:10:31Z) - Co-Mixup: Saliency Guided Joint Mixup with Supermodular Diversity [15.780905917870427]
We propose a new perspective on batch mixup and formulate the optimal construction of a batch of mixup data.
We also propose an efficient modular approximation based iterative submodular computation algorithm for efficient mixup per each minibatch.
Our experiments show the proposed method achieves the state of the art generalization, calibration, and weakly supervised localization results.
arXiv Detail & Related papers (2021-02-05T09:12:02Z) - Improving Generalization in Reinforcement Learning with Mixture
Regularization [113.12412071717078]
We introduce a simple approach, named mixreg, which trains agents on a mixture of observations from different training environments.
Mixreg increases the data diversity more effectively and helps learn smoother policies.
Results show mixreg outperforms the well-established baselines on unseen testing environments by a large margin.
arXiv Detail & Related papers (2020-10-21T08:12:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.