Affinity and Diversity: Quantifying Mechanisms of Data Augmentation
- URL: http://arxiv.org/abs/2002.08973v2
- Date: Thu, 4 Jun 2020 19:04:48 GMT
- Title: Affinity and Diversity: Quantifying Mechanisms of Data Augmentation
- Authors: Raphael Gontijo-Lopes, Sylvia J. Smullin, Ekin D. Cubuk, Ethan Dyer
- Abstract summary: We introduce measures: Affinity and Diversity.
We find that augmentation performance is predicted not by either of these alone but by jointly optimizing the two.
- Score: 25.384464387734802
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Though data augmentation has become a standard component of deep neural
network training, the underlying mechanism behind the effectiveness of these
techniques remains poorly understood. In practice, augmentation policies are
often chosen using heuristics of either distribution shift or augmentation
diversity. Inspired by these, we seek to quantify how data augmentation
improves model generalization. To this end, we introduce interpretable and
easy-to-compute measures: Affinity and Diversity. We find that augmentation
performance is predicted not by either of these alone but by jointly optimizing
the two.
Related papers
- AdaAugment: A Tuning-Free and Adaptive Approach to Enhance Data Augmentation [12.697608744311122]
AdaAugment is a tuning-free Adaptive Augmentation method.
It dynamically adjusts augmentation magnitudes for individual training samples based on real-time feedback from the target network.
It consistently outperforms other state-of-the-art DA methods in effectiveness while maintaining remarkable efficiency.
arXiv Detail & Related papers (2024-05-19T06:54:03Z) - Boosting Model Resilience via Implicit Adversarial Data Augmentation [20.768174896574916]
We propose to augment the deep features of samples by incorporating adversarial and anti-adversarial perturbation distributions.
We then theoretically reveal that our augmentation process approximates the optimization of a surrogate loss function.
We conduct extensive experiments across four common biased learning scenarios.
arXiv Detail & Related papers (2024-04-25T03:22:48Z) - DualAug: Exploiting Additional Heavy Augmentation with OOD Data
Rejection [77.6648187359111]
We propose a novel data augmentation method, named textbfDualAug, to keep the augmentation in distribution as much as possible at a reasonable time and computational cost.
Experiments on supervised image classification benchmarks show that DualAug improve various automated data augmentation method.
arXiv Detail & Related papers (2023-10-12T08:55:10Z) - Implicit Counterfactual Data Augmentation for Robust Learning [24.795542869249154]
This study proposes an Implicit Counterfactual Data Augmentation method to remove spurious correlations and make stable predictions.
Experiments have been conducted across various biased learning scenarios covering both image and text datasets.
arXiv Detail & Related papers (2023-04-26T10:36:40Z) - Local Magnification for Data and Feature Augmentation [53.04028225837681]
We propose an easy-to-implement and model-free data augmentation method called Local Magnification (LOMA)
LOMA generates additional training data by randomly magnifying a local area of the image.
Experiments show that our proposed LOMA, though straightforward, can be combined with standard data augmentation to significantly improve the performance on image classification and object detection.
arXiv Detail & Related papers (2022-11-15T02:51:59Z) - Data Augmentation vs. Equivariant Networks: A Theory of Generalization
on Dynamics Forecasting [24.363954435050264]
Exploiting symmetry in dynamical systems is a powerful way to improve the generalization of deep learning.
Data augmentation and equivariant networks are two major approaches to injecting symmetry into learning.
We derive the generalization bounds for data augmentation and equivariant networks, characterizing their effect on learning in a unified framework.
arXiv Detail & Related papers (2022-06-19T17:00:12Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z) - DivAug: Plug-in Automated Data Augmentation with Explicit Diversity
Maximization [41.82120128496555]
Two factors regarding the diversity of augmented data are still missing: 1) the explicit definition (and thus measurement) of diversity and 2) the quantifiable relationship between diversity and its regularization effects.
We propose a diversity measure called Variance Diversity and theoretically show that the regularization effect of data augmentation is promised by Variance Diversity.
An unsupervised sampling-based framework, DivAug, is designed to directly maximize Variance Diversity and hence strengthen the regularization effect.
arXiv Detail & Related papers (2021-03-26T16:00:01Z) - CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for
Natural Language Understanding [67.61357003974153]
We propose a novel data augmentation framework dubbed CoDA.
CoDA synthesizes diverse and informative augmented examples by integrating multiple transformations organically.
A contrastive regularization objective is introduced to capture the global relationship among all the data samples.
arXiv Detail & Related papers (2020-10-16T23:57:03Z) - On the Benefits of Invariance in Neural Networks [56.362579457990094]
We show that training with data augmentation leads to better estimates of risk and thereof gradients, and we provide a PAC-Bayes generalization bound for models trained with data augmentation.
We also show that compared to data augmentation, feature averaging reduces generalization error when used with convex losses, and tightens PAC-Bayes bounds.
arXiv Detail & Related papers (2020-05-01T02:08:58Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.