DualAug: Exploiting Additional Heavy Augmentation with OOD Data
Rejection
- URL: http://arxiv.org/abs/2310.08139v2
- Date: Mon, 16 Oct 2023 03:02:49 GMT
- Title: DualAug: Exploiting Additional Heavy Augmentation with OOD Data
Rejection
- Authors: Zehao Wang, Yiwen Guo, Qizhang Li, Guanglei Yang, Wangmeng Zuo
- Abstract summary: We propose a novel data augmentation method, named textbfDualAug, to keep the augmentation in distribution as much as possible at a reasonable time and computational cost.
Experiments on supervised image classification benchmarks show that DualAug improve various automated data augmentation method.
- Score: 77.6648187359111
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation is a dominant method for reducing model overfitting and
improving generalization. Most existing data augmentation methods tend to find
a compromise in augmenting the data, \textit{i.e.}, increasing the amplitude of
augmentation carefully to avoid degrading some data too much and doing harm to
the model performance. We delve into the relationship between data augmentation
and model performance, revealing that the performance drop with heavy
augmentation comes from the presence of out-of-distribution (OOD) data.
Nonetheless, as the same data transformation has different effects for
different training samples, even for heavy augmentation, there remains part of
in-distribution data which is beneficial to model training. Based on the
observation, we propose a novel data augmentation method, named
\textbf{DualAug}, to keep the augmentation in distribution as much as possible
at a reasonable time and computational cost. We design a data mixing strategy
to fuse augmented data from both the basic- and the heavy-augmentation
branches. Extensive experiments on supervised image classification benchmarks
show that DualAug improve various automated data augmentation method. Moreover,
the experiments on semi-supervised learning and contrastive self-supervised
learning demonstrate that our DualAug can also improve related method. Code is
available at
\href{https://github.com/shuguang99/DualAug}{https://github.com/shuguang99/DualAug}.
Related papers
- DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception [78.26734070960886]
Current perceptive models heavily depend on resource-intensive datasets.
We introduce perception-aware loss (P.A. loss) through segmentation, improving both quality and controllability.
Our method customizes data augmentation by extracting and utilizing perception-aware attribute (P.A. Attr) during generation.
arXiv Detail & Related papers (2024-03-20T04:58:03Z) - Do Generated Data Always Help Contrastive Learning? [32.58214897368031]
Contrastive Learning (CL) has emerged as one of the most successful paradigms for unsupervised visual representation learning.
With the rise of generative models, especially diffusion models, the ability to generate realistic images close to the real data distribution has been well recognized.
However, we find that the generated data (even from a good diffusion model like DDPM) may sometimes even harm contrastive learning.
arXiv Detail & Related papers (2024-03-19T05:17:47Z) - Fourier-basis Functions to Bridge Augmentation Gap: Rethinking Frequency
Augmentation in Image Classification [3.129187821625805]
Auxiliary Fourier-basis Augmentation (AFA) is a technique targeting augmentation in the frequency domain and filling the augmentation gap left by visual augmentations.
Our results show that AFA benefits the robustness of models against common corruptions, OOD generalization, and consistency of performance of models against increasing perturbations, with negligible deficit to the standard performance of models.
arXiv Detail & Related papers (2024-03-04T11:30:02Z) - DiffAug: Enhance Unsupervised Contrastive Learning with Domain-Knowledge-Free Diffusion-based Data Augmentation [48.25619775814776]
This paper proposes DiffAug, a novel unsupervised contrastive learning technique with diffusion mode-based positive data generation.
DiffAug consists of a semantic encoder and a conditional diffusion model; the conditional diffusion model generates new positive samples conditioned on the semantic encoding.
Experimental evaluations show that DiffAug outperforms hand-designed and SOTA model-based augmentation methods on DNA sequence, visual, and bio-feature datasets.
arXiv Detail & Related papers (2023-09-10T13:28:46Z) - Tied-Augment: Controlling Representation Similarity Improves Data
Augmentation [18.446051824487792]
We propose a framework called Tied-Augment to improve data augmentation in a wide range of applications.
Tied-Augment can improve state-of-the-art methods from data augmentation (e.g. RandAugment, mixup), optimization (e.g. SAM), and semi-supervised learning (e.g. FixMatch)
arXiv Detail & Related papers (2023-05-22T22:23:40Z) - Instance-Conditioned GAN Data Augmentation for Representation Learning [29.36473147430433]
We introduce DA_IC-GAN, a learnable data augmentation module that can be used off-the-shelf in conjunction with most state-of-the-art training recipes.
We show that DA_IC-GAN can boost accuracy to between 1%p and 2%p with the highest capacity models.
We additionally couple DA_IC-GAN with a self-supervised training recipe and show that we can also achieve an improvement of 1%p in accuracy in some settings.
arXiv Detail & Related papers (2023-03-16T22:45:43Z) - How Much Data Are Augmentations Worth? An Investigation into Scaling
Laws, Invariance, and Implicit Regularization [76.58017437197859]
We find that in out-of-distribution testing scenarios, augmentations which yield samples that are diverse, but inconsistent with the data distribution can be even more valuable than additional training data.
We show that augmentations induce additionality during training, effectively flattening the loss landscape.
arXiv Detail & Related papers (2022-10-12T17:42:01Z) - Learning Representational Invariances for Data-Efficient Action
Recognition [52.23716087656834]
We show that our data augmentation strategy leads to promising performance on the Kinetics-100, UCF-101, and HMDB-51 datasets.
We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.
arXiv Detail & Related papers (2021-03-30T17:59:49Z) - WeMix: How to Better Utilize Data Augmentation [36.07712244423405]
We develop a comprehensive analysis that reveals pros and cons of data augmentation.
The main limitation of data augmentation arises from the data bias.
We develop two novel algorithms, termed "AugDrop" and "MixLoss", to correct the data bias in the data augmentation.
arXiv Detail & Related papers (2020-10-03T03:12:18Z) - Generative Data Augmentation for Commonsense Reasoning [75.26876609249197]
G-DAUGC is a novel generative data augmentation method that aims to achieve more accurate and robust learning in the low-resource setting.
G-DAUGC consistently outperforms existing data augmentation methods based on back-translation.
Our analysis demonstrates that G-DAUGC produces a diverse set of fluent training examples, and that its selection and training approaches are important for performance.
arXiv Detail & Related papers (2020-04-24T06:12:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.