DivAug: Plug-in Automated Data Augmentation with Explicit Diversity
Maximization
- URL: http://arxiv.org/abs/2103.14545v1
- Date: Fri, 26 Mar 2021 16:00:01 GMT
- Title: DivAug: Plug-in Automated Data Augmentation with Explicit Diversity
Maximization
- Authors: Zirui Liu, Haifeng Jin, Ting-Hsiang Wang, Kaixiong Zhou, Xia Hu
- Abstract summary: Two factors regarding the diversity of augmented data are still missing: 1) the explicit definition (and thus measurement) of diversity and 2) the quantifiable relationship between diversity and its regularization effects.
We propose a diversity measure called Variance Diversity and theoretically show that the regularization effect of data augmentation is promised by Variance Diversity.
An unsupervised sampling-based framework, DivAug, is designed to directly maximize Variance Diversity and hence strengthen the regularization effect.
- Score: 41.82120128496555
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Human-designed data augmentation strategies have been replaced by
automatically learned augmentation policy in the past two years. Specifically,
recent work has empirically shown that the superior performance of the
automated data augmentation methods stems from increasing the diversity of
augmented data. However, two factors regarding the diversity of augmented data
are still missing: 1) the explicit definition (and thus measurement) of
diversity and 2) the quantifiable relationship between diversity and its
regularization effects. To bridge this gap, we propose a diversity measure
called Variance Diversity and theoretically show that the regularization effect
of data augmentation is promised by Variance Diversity. We validate in
experiments that the relative gain from automated data augmentation in test
accuracy is highly correlated to Variance Diversity. An unsupervised
sampling-based framework, DivAug, is designed to directly maximize Variance
Diversity and hence strengthen the regularization effect. Without requiring a
separate search process, the performance gain from DivAug is comparable with
the state-of-the-art method with better efficiency. Moreover, under the
semi-supervised setting, our framework can further improve the performance of
semi-supervised learning algorithms when compared to RandAugment, making it
highly applicable to real-world problems, where labeled data is scarce.
Related papers
- $\textbf{Only-IF}$:Revealing the Decisive Effect of Instruction Diversity on Generalization [1.6958018695660049]
We show that $textbfonly emerges$ when training data is diversified enough across semantic domains.
We extend our analysis to real-world scenarios, including fine-tuning of $textit$textbfspecialist$$ and $textit$textbfgeneralist$$ models.
arXiv Detail & Related papers (2024-10-07T03:15:11Z) - DualAug: Exploiting Additional Heavy Augmentation with OOD Data
Rejection [77.6648187359111]
We propose a novel data augmentation method, named textbfDualAug, to keep the augmentation in distribution as much as possible at a reasonable time and computational cost.
Experiments on supervised image classification benchmarks show that DualAug improve various automated data augmentation method.
arXiv Detail & Related papers (2023-10-12T08:55:10Z) - Source-free Domain Adaptation Requires Penalized Diversity [60.04618512479438]
Source-free domain adaptation (SFDA) was introduced to address knowledge transfer between different domains in the absence of source data.
In unsupervised SFDA, the diversity is limited to learning a single hypothesis on the source or learning multiple hypotheses with a shared feature extractor.
We propose a novel unsupervised SFDA algorithm that promotes representational diversity through the use of separate feature extractors.
arXiv Detail & Related papers (2023-04-06T00:20:19Z) - RangeAugment: Efficient Online Augmentation with Range Learning [54.61514286212455]
RangeAugment efficiently learns the range of magnitudes for individual as well as composite augmentation operations.
We show that RangeAugment achieves competitive performance to state-of-the-art automatic augmentation methods with 4-5 times fewer augmentation operations.
arXiv Detail & Related papers (2022-12-20T18:55:54Z) - Automatic Data Augmentation Selection and Parametrization in Contrastive
Self-Supervised Speech Representation Learning [21.423349835589793]
This work introduces a conditional independance-based method which allows for automatically selecting a suitable distribution on the choice of augmentations and their parametrization from a set of predefined ones.
Experiments performed on two different downstream tasks validate the proposed approach showing better results than experimenting without augmentation or with baseline augmentations.
arXiv Detail & Related papers (2022-04-08T16:30:50Z) - Learning Representational Invariances for Data-Efficient Action
Recognition [52.23716087656834]
We show that our data augmentation strategy leads to promising performance on the Kinetics-100, UCF-101, and HMDB-51 datasets.
We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.
arXiv Detail & Related papers (2021-03-30T17:59:49Z) - CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for
Natural Language Understanding [67.61357003974153]
We propose a novel data augmentation framework dubbed CoDA.
CoDA synthesizes diverse and informative augmented examples by integrating multiple transformations organically.
A contrastive regularization objective is introduced to capture the global relationship among all the data samples.
arXiv Detail & Related papers (2020-10-16T23:57:03Z) - Affinity and Diversity: Quantifying Mechanisms of Data Augmentation [25.384464387734802]
We introduce measures: Affinity and Diversity.
We find that augmentation performance is predicted not by either of these alone but by jointly optimizing the two.
arXiv Detail & Related papers (2020-02-20T19:02:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.