The good, the bad and the ugly sides of data augmentation: An implicit
spectral regularization perspective
- URL: http://arxiv.org/abs/2210.05021v3
- Date: Tue, 27 Feb 2024 20:55:18 GMT
- Title: The good, the bad and the ugly sides of data augmentation: An implicit
spectral regularization perspective
- Authors: Chi-Heng Lin, Chiraag Kaushik, Eva L. Dyer, Vidya Muthukumar
- Abstract summary: Data augmentation (DA) is a powerful workhorse for bolstering performance in modern machine learning.
In this work, we develop a new theoretical framework to characterize the impact of a general class of DA on generalization.
Our framework highlights the nuanced and sometimes surprising impacts of DA on generalization, and serves as a testbed for novel augmentation design.
- Score: 14.229855423083922
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation (DA) is a powerful workhorse for bolstering performance in
modern machine learning. Specific augmentations like translations and scaling
in computer vision are traditionally believed to improve generalization by
generating new (artificial) data from the same distribution. However, this
traditional viewpoint does not explain the success of prevalent augmentations
in modern machine learning (e.g. randomized masking, cutout, mixup), that
greatly alter the training data distribution. In this work, we develop a new
theoretical framework to characterize the impact of a general class of DA on
underparameterized and overparameterized linear model generalization. Our
framework reveals that DA induces implicit spectral regularization through a
combination of two distinct effects: a) manipulating the relative proportion of
eigenvalues of the data covariance matrix in a training-data-dependent manner,
and b) uniformly boosting the entire spectrum of the data covariance matrix
through ridge regression. These effects, when applied to popular augmentations,
give rise to a wide variety of phenomena, including discrepancies in
generalization between over-parameterized and under-parameterized regimes and
differences between regression and classification tasks. Our framework
highlights the nuanced and sometimes surprising impacts of DA on
generalization, and serves as a testbed for novel augmentation design.
Related papers
- Understanding Augmentation-based Self-Supervised Representation Learning
via RKHS Approximation and Regression [53.15502562048627]
Recent work has built the connection between self-supervised learning and the approximation of the top eigenspace of a graph Laplacian operator.
This work delves into a statistical analysis of augmentation-based pretraining.
arXiv Detail & Related papers (2023-06-01T15:18:55Z) - Towards Understanding How Data Augmentation Works with Imbalanced Data [17.478900028887537]
We study the effect of data augmentation on three different classifiers, convolutional neural networks, support vector machines, and logistic regression models.
Our research indicates that DA, when applied to imbalanced data, produces substantial changes in model weights, support vectors and feature selection.
We hypothesize that DA works by facilitating variances in data, so that machine learning models can associate changes in the data with labels.
arXiv Detail & Related papers (2023-04-12T15:01:22Z) - On the Strong Correlation Between Model Invariance and Generalization [54.812786542023325]
Generalization captures a model's ability to classify unseen data.
Invariance measures consistency of model predictions on transformations of the data.
From a dataset-centric view, we find a certain model's accuracy and invariance linearly correlated on different test sets.
arXiv Detail & Related papers (2022-07-14T17:08:25Z) - ER: Equivariance Regularizer for Knowledge Graph Completion [107.51609402963072]
We propose a new regularizer, namely, Equivariance Regularizer (ER)
ER can enhance the generalization ability of the model by employing the semantic equivariance between the head and tail entities.
The experimental results indicate a clear and substantial improvement over the state-of-the-art relation prediction methods.
arXiv Detail & Related papers (2022-06-24T08:18:05Z) - Data Augmentation vs. Equivariant Networks: A Theory of Generalization
on Dynamics Forecasting [24.363954435050264]
Exploiting symmetry in dynamical systems is a powerful way to improve the generalization of deep learning.
Data augmentation and equivariant networks are two major approaches to injecting symmetry into learning.
We derive the generalization bounds for data augmentation and equivariant networks, characterizing their effect on learning in a unified framework.
arXiv Detail & Related papers (2022-06-19T17:00:12Z) - Generalization Gap in Amortized Inference [17.951010274427187]
We study the generalizations of a popular class of probabilistic models - the Variational Auto-Encoder (VAE)
We show that the over-fitting phenomenon is usually dominated by the amortized inference network.
We propose a new training objective, inspired by the classic wake-sleep algorithm, to improve the generalizations properties of amortized inference.
arXiv Detail & Related papers (2022-05-23T21:28:47Z) - Regularising for invariance to data augmentation improves supervised
learning [82.85692486314949]
We show that using multiple augmentations per input can improve generalisation.
We propose an explicit regulariser that encourages this invariance on the level of individual model predictions.
arXiv Detail & Related papers (2022-03-07T11:25:45Z) - Double Descent and Other Interpolation Phenomena in GANs [2.7007335372861974]
We study the generalization error as a function of latent space dimension in generative adversarial networks (GANs)
We develop a novel pseudo-supervised learning approach for GANs where the training utilizes pairs of fabricated (noise) inputs in conjunction with real output samples.
While our analysis focuses mostly on linear models, we also apply important insights for improving generalization of nonlinear, multilayer GANs.
arXiv Detail & Related papers (2021-06-07T23:07:57Z) - Supercharging Imbalanced Data Learning With Energy-based Contrastive
Representation Transfer [72.5190560787569]
In computer vision, learning from long tailed datasets is a recurring theme, especially for natural image datasets.
Our proposal posits a meta-distributional scenario, where the data generating mechanism is invariant across the label-conditional feature distributions.
This allows us to leverage a causal data inflation procedure to enlarge the representation of minority classes.
arXiv Detail & Related papers (2020-11-25T00:13:11Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.