Sample Efficiency of Data Augmentation Consistency Regularization
- URL: http://arxiv.org/abs/2202.12230v1
- Date: Thu, 24 Feb 2022 17:50:31 GMT
- Title: Sample Efficiency of Data Augmentation Consistency Regularization
- Authors: Shuo Yang, Yijun Dong, Rachel Ward, Inderjit S. Dhillon, Sujay
Sanghavi, Qi Lei
- Abstract summary: We first present a simple and novel analysis for linear regression, demonstrating that data augmentation consistency (DAC) is intrinsically more efficient than empirical risk minimization on augmented data (DA-ERM)
We then propose a new theoretical framework for analyzing DAC, which reframes DAC as a way to reduce function class complexity.
- Score: 44.19833682906076
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation is popular in the training of large neural networks;
currently, however, there is no clear theoretical comparison between different
algorithmic choices on how to use augmented data. In this paper, we take a step
in this direction - we first present a simple and novel analysis for linear
regression, demonstrating that data augmentation consistency (DAC) is
intrinsically more efficient than empirical risk minimization on augmented data
(DA-ERM). We then propose a new theoretical framework for analyzing DAC, which
reframes DAC as a way to reduce function class complexity. The new framework
characterizes the sample efficiency of DAC for various non-linear models (e.g.,
neural networks). Further, we perform experiments that make a clean and
apples-to-apples comparison (i.e., with no extra modeling or data tweaks)
between ERM and consistency regularization using CIFAR-100 and WideResNet;
these together demonstrate the superior efficacy of DAC.
Related papers
- Heterogeneous Learning Rate Scheduling for Neural Architecture Search on Long-Tailed Datasets [0.0]
We propose a novel adaptive learning rate scheduling strategy tailored for the architecture parameters of DARTS.
Our approach dynamically adjusts the learning rate of the architecture parameters based on the training epoch, preventing the disruption of well-trained representations.
arXiv Detail & Related papers (2024-06-11T07:32:25Z) - Improving SMOTE via Fusing Conditional VAE for Data-adaptive Noise Filtering [0.5735035463793009]
We introduce a framework to enhance the SMOTE algorithm using Variational Autoencoders (VAE)
Our approach systematically quantifies the density of data points in a low-dimensional latent space using the VAE, simultaneously incorporating information on class labels and classification difficulty.
Empirical studies on several imbalanced datasets represent that this simple process innovatively improves the conventional SMOTE algorithm over the deep learning models.
arXiv Detail & Related papers (2024-05-30T07:06:02Z) - On Improving the Algorithm-, Model-, and Data- Efficiency of Self-Supervised Learning [18.318758111829386]
We propose an efficient single-branch SSL method based on non-parametric instance discrimination.
We also propose a novel self-distillation loss that minimizes the KL divergence between the probability distribution and its square root version.
arXiv Detail & Related papers (2024-04-30T06:39:04Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Learning Better with Less: Effective Augmentation for Sample-Efficient
Visual Reinforcement Learning [57.83232242068982]
Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms.
It remains unclear which attributes of DA account for its effectiveness in achieving sample-efficient visual RL.
This work conducts comprehensive experiments to assess the impact of DA's attributes on its efficacy.
arXiv Detail & Related papers (2023-05-25T15:46:20Z) - Infinite Recommendation Networks: A Data-Centric Approach [8.044430277912936]
We leverage the Neural Tangent Kernel to train infinitely-wide neural networks to devise $infty$-AE: an autoencoder with infinitely-wide bottleneck layers.
We also develop Distill-CF for synthesizing tiny, high-fidelity data summaries.
We observe 96-105% of $infty$-AE's performance on the full dataset with as little as 0.1% of the original dataset size.
arXiv Detail & Related papers (2022-06-03T00:34:13Z) - Contrastive Model Inversion for Data-Free Knowledge Distillation [60.08025054715192]
We propose Contrastive Model Inversion, where the data diversity is explicitly modeled as an optimizable objective.
Our main observation is that, under the constraint of the same amount of data, higher data diversity usually indicates stronger instance discrimination.
Experiments on CIFAR-10, CIFAR-100, and Tiny-ImageNet demonstrate that CMI achieves significantly superior performance when the generated data are used for knowledge distillation.
arXiv Detail & Related papers (2021-05-18T15:13:00Z) - Online Robust and Adaptive Learning from Data Streams [22.319483572757097]
In online learning, it is necessary to learn robustly to outliers and to adapt quickly to changes in the underlying data generating mechanism.
In this paper, we refer to the former attribute of online learning algorithms as robustness and to the latter as adaptivity.
We propose a novel approximation-based robustness-adaptivity algorithm (SRA) to evaluate the tradeoff.
arXiv Detail & Related papers (2020-07-23T17:49:04Z) - Least Squares Regression with Markovian Data: Fundamental Limits and
Algorithms [69.45237691598774]
We study the problem of least squares linear regression where the data-points are dependent and are sampled from a Markov chain.
We establish sharp information theoretic minimax lower bounds for this problem in terms of $tau_mathsfmix$.
We propose an algorithm based on experience replay--a popular reinforcement learning technique--that achieves a significantly better error rate.
arXiv Detail & Related papers (2020-06-16T04:26:50Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.