The Benefits of Mixup for Feature Learning
- URL: http://arxiv.org/abs/2303.08433v1
- Date: Wed, 15 Mar 2023 08:11:47 GMT
- Title: The Benefits of Mixup for Feature Learning
- Authors: Difan Zou, Yuan Cao, Yuanzhi Li, Quanquan Gu
- Abstract summary: We first show that Mixup using different linear parameters for features and labels can still achieve similar performance to standard Mixup.
We consider a feature-noise data model and show that Mixup training can effectively learn the rare features from its mixture with the common features.
In contrast, standard training can only learn the common features but fails to learn the rare features, thus suffering from bad performance.
- Score: 117.93273337740442
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Mixup, a simple data augmentation method that randomly mixes two data points
via linear interpolation, has been extensively applied in various deep learning
applications to gain better generalization. However, the theoretical
underpinnings of its efficacy are not yet fully understood. In this paper, we
aim to seek a fundamental understanding of the benefits of Mixup. We first show
that Mixup using different linear interpolation parameters for features and
labels can still achieve similar performance to the standard Mixup. This
indicates that the intuitive linearity explanation in Zhang et al., (2018) may
not fully explain the success of Mixup. Then we perform a theoretical study of
Mixup from the feature learning perspective. We consider a feature-noise data
model and show that Mixup training can effectively learn the rare features
(appearing in a small fraction of data) from its mixture with the common
features (appearing in a large fraction of data). In contrast, standard
training can only learn the common features but fails to learn the rare
features, thus suffering from bad generalization performance. Moreover, our
theoretical analysis also shows that the benefits of Mixup for feature learning
are mostly gained in the early training phase, based on which we propose to
apply early stopping in Mixup. Experimental results verify our theoretical
findings and demonstrate the effectiveness of the early-stopped Mixup training.
Related papers
- Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance [55.872926690722714]
We study the predictability of model performance regarding the mixture proportions in function forms.
We propose nested use of the scaling laws of training steps, model sizes, and our data mixing law.
Our method effectively optimize the training mixture of a 1B model trained for 100B tokens in RedPajama.
arXiv Detail & Related papers (2024-03-25T17:14:00Z) - Selective Mixup Helps with Distribution Shifts, But Not (Only) because
of Mixup [26.105340203096596]
We show that non-random selection of pairs affects the training distribution and improve generalization by means completely unrelated to the mixing.
We have found a new equivalence between two successful methods: selective mixup and resampling.
arXiv Detail & Related papers (2023-05-26T10:56:22Z) - MixupE: Understanding and Improving Mixup from Directional Derivative
Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup.
Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z) - Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup [14.37428912254029]
Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels.
We focus on classification problems in which each class may have multiple associated features (or views) that can be used to predict the class correctly.
Our main theoretical results demonstrate that, for a non-trivial class of data distributions with two features per class, training a 2-layer convolutional network using empirical risk minimization can lead to learning only one feature for almost all classes while training with a specific instantiation of Mixup succeeds in learning both features for every class.
arXiv Detail & Related papers (2022-10-24T18:11:37Z) - DoubleMix: Simple Interpolation-Based Data Augmentation for Text
Classification [56.817386699291305]
This paper proposes a simple yet effective data augmentation approach termed DoubleMix.
DoubleMix first generates several perturbed samples for each training data.
It then uses the perturbed data and original data to carry out a two-step in the hidden space of neural models.
arXiv Detail & Related papers (2022-09-12T15:01:04Z) - OpenMixup: Open Mixup Toolbox and Benchmark for Visual Representation Learning [53.57075147367114]
We introduce OpenMixup, the first mixup augmentation and benchmark for visual representation learning.
We train 18 representative mixup baselines from scratch and rigorously evaluate them across 11 image datasets.
We also open-source our modular backbones, including a collection of popular vision backbones, optimization strategies, and analysis toolkits.
arXiv Detail & Related papers (2022-09-11T12:46:01Z) - Harnessing Hard Mixed Samples with Decoupled Regularizer [69.98746081734441]
Mixup is an efficient data augmentation approach that improves the generalization of neural networks by smoothing the decision boundary with mixed data.
In this paper, we propose an efficient mixup objective function with a decoupled regularizer named Decoupled Mixup (DM)
DM can adaptively utilize hard mixed samples to mine discriminative features without losing the original smoothness of mixup.
arXiv Detail & Related papers (2022-03-21T07:12:18Z) - Contrastive-mixup learning for improved speaker verification [17.93491404662201]
This paper proposes a novel formulation of prototypical loss with mixup for speaker verification.
Mixup is a simple yet efficient data augmentation technique that fabricates a weighted combination of random data point and label pairs.
arXiv Detail & Related papers (2022-02-22T05:09:22Z) - MixMix: All You Need for Data-Free Compression Are Feature and Data
Mixing [30.14401315979937]
We propose MixMix to overcome the difficulties of generalizability and inexact inversion.
We prove the effectiveness of MixMix from both theoretical and empirical perspectives.
MixMix achieves up to 4% and 20% accuracy uplift on quantization and pruning, respectively, compared to existing data-free compression work.
arXiv Detail & Related papers (2020-11-19T15:33:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.