A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability
- URL: http://arxiv.org/abs/2212.10888v2
- Date: Tue, 4 Jun 2024 02:11:25 GMT
- Title: A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability
- Authors: Chengtai Cao, Fan Zhou, Yurou Dai, Jianping Wang, Kunpeng Zhang,
- Abstract summary: Data augmentation (DA) is indispensable in modern machine learning and deep neural networks.
This survey comprehensively reviews a crucial subset of DA techniques, namely Mix-based Data Augmentation (MixDA)
In contrast to traditional DA approaches that operate on single samples or entire datasets, MixDA stands out due to its effectiveness, simplicity, flexibility, computational efficiency, theoretical foundation, and broad applicability.
- Score: 29.40977854491399
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation (DA) is indispensable in modern machine learning and deep neural networks. The basic idea of DA is to construct new training data to improve the model's generalization by adding slightly disturbed versions of existing data or synthesizing new data. This survey comprehensively reviews a crucial subset of DA techniques, namely Mix-based Data Augmentation (MixDA), which generates novel samples by combining multiple examples. In contrast to traditional DA approaches that operate on single samples or entire datasets, MixDA stands out due to its effectiveness, simplicity, flexibility, computational efficiency, theoretical foundation, and broad applicability. We begin by introducing a novel taxonomy that categorizes MixDA into Mixup-based, Cutmix-based, and mixture approaches based on a hierarchical perspective of the data mixing operation. Subsequently, we provide an in-depth review of various MixDA techniques, focusing on their underlying motivations. Owing to its versatility, MixDA has penetrated a wide range of applications, which we also thoroughly investigate in this survey. Moreover, we delve into the underlying mechanisms of MixDA's effectiveness by examining its impact on model generalization and calibration while providing insights into the model's behavior by analyzing the inherent properties of MixDA. Finally, we recapitulate the critical findings and fundamental challenges of current MixDA studies while outlining the potential directions for future works. Different from previous related surveys that focus on DA approaches in specific domains (e.g., CV and NLP) or only review a limited subset of MixDA studies, we are the first to provide a systematical survey of MixDA, covering its taxonomy, methodology, application, and explainability. Furthermore, we provide promising directions for researchers interested in this exciting area.
Related papers
- A Survey on Mixup Augmentations and Beyond [59.578288906956736]
Mixup and relevant data-mixing methods that convexly combine selected samples and the corresponding labels are widely adopted.
This survey presents a comprehensive review of foundational mixup methods and their applications.
arXiv Detail & Related papers (2024-09-08T19:32:22Z) - Embedding And Clustering Your Data Can Improve Contrastive Pretraining [0.0]
We explore extending training data stratification beyond source granularity by leveraging a pretrained text embedding model and the classic k-means clustering algorithm.
Experimentally, we observe a notable increase in NDCG@10 when pretraining a BERT-based text embedding model on query-passage pairs from the MSMARCO passage retrieval dataset.
arXiv Detail & Related papers (2024-07-26T17:36:40Z) - SCMix: Stochastic Compound Mixing for Open Compound Domain Adaptation in Semantic Segmentation [20.311948275950606]
Open compound domain adaptation (OCDA) aims to transfer knowledge from a labeled source domain to a mix of unlabeled homogeneous compound target domains while generalizing to open unseen domains.
Existing OCDA methods solve the intra-domain gaps by a divide-and-conquer strategy, which divides the problem into several individual and parallel domain adaptation (DA) tasks.
We present Compound Mixing (SCMix), an augmentation strategy with the primary objective of mitigating the divergence between source and mixed target distributions.
arXiv Detail & Related papers (2024-05-23T07:53:10Z) - Tackling Computational Heterogeneity in FL: A Few Theoretical Insights [68.8204255655161]
We introduce and analyse a novel aggregation framework that allows for formalizing and tackling computational heterogeneous data.
Proposed aggregation algorithms are extensively analyzed from a theoretical, and an experimental prospective.
arXiv Detail & Related papers (2023-07-12T16:28:21Z) - DualMix: Unleashing the Potential of Data Augmentation for Online
Class-Incremental Learning [14.194817677415065]
We show that augmented samples with lower correlation to the original data are more effective in preventing forgetting.
We propose the Enhanced Mixup (EnMix) method that mixes the augmented samples and their labels simultaneously.
To solve the class imbalance problem, we design an Adaptive Mixup (AdpMix) method to calibrate the decision boundaries.
arXiv Detail & Related papers (2023-03-14T12:55:42Z) - Dataset Distillation: A Comprehensive Review [76.26276286545284]
dataset distillation (DD) aims to derive a much smaller dataset containing synthetic samples, based on which the trained models yield performance comparable with those trained on the original dataset.
This paper gives a comprehensive review and summary of recent advances in DD and its application.
arXiv Detail & Related papers (2023-01-17T17:03:28Z) - MixupE: Understanding and Improving Mixup from Directional Derivative
Perspective [86.06981860668424]
We propose an improved version of Mixup, theoretically justified to deliver better generalization performance than the vanilla Mixup.
Our results show that the proposed method improves Mixup across multiple datasets using a variety of architectures.
arXiv Detail & Related papers (2022-12-27T07:03:52Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - MixAugment & Mixup: Augmentation Methods for Facial Expression
Recognition [4.273075747204267]
We propose a new data augmentation strategy which is based on Mixup, called MixAugment.
We conduct an extensive experimental study that proves the effectiveness of MixAugment over Mixup and various state-of-the-art methods.
arXiv Detail & Related papers (2022-05-09T17:43:08Z) - Repulsive Mixture Models of Exponential Family PCA for Clustering [127.90219303669006]
The mixture extension of exponential family principal component analysis ( EPCA) was designed to encode much more structural information about data distribution than the traditional EPCA.
The traditional mixture of local EPCAs has the problem of model redundancy, i.e., overlaps among mixing components, which may cause ambiguity for data clustering.
In this paper, a repulsiveness-encouraging prior is introduced among mixing components and a diversified EPCA mixture (DEPCAM) model is developed in the Bayesian framework.
arXiv Detail & Related papers (2020-04-07T04:07:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.