The Effects of Mixed Sample Data Augmentation are Class Dependent
- URL: http://arxiv.org/abs/2307.09136v2
- Date: Wed, 27 Mar 2024 07:16:28 GMT
- Title: The Effects of Mixed Sample Data Augmentation are Class Dependent
- Authors: Haeil Lee, Hansang Lee, Junmo Kim,
- Abstract summary: Mixed Sample Data Augmentation (MSDA) techniques, such as Mixup, CutMix, and PuzzleMix, have been widely acknowledged for enhancing performance in a variety of tasks.
Previous study reported the class dependency of traditional data augmentation (DA), where certain classes benefit disproportionately compared to others.
This paper reveals a class dependent effect of MSDA, where some classes experience improved performance while others experience degraded performance.
- Score: 24.064325847615546
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mixed Sample Data Augmentation (MSDA) techniques, such as Mixup, CutMix, and PuzzleMix, have been widely acknowledged for enhancing performance in a variety of tasks. A previous study reported the class dependency of traditional data augmentation (DA), where certain classes benefit disproportionately compared to others. This paper reveals a class dependent effect of MSDA, where some classes experience improved performance while others experience degraded performance. This research addresses the issue of class dependency in MSDA and proposes an algorithm to mitigate it. The approach involves training on a mixture of MSDA and non-MSDA data, which not only mitigates the negative impact on the affected classes, but also improves overall accuracy. Furthermore, we provide in-depth analysis and discussion of why MSDA introduced class dependencies and which classes are most likely to have them.
Related papers
- Unveiling the Superior Paradigm: A Comparative Study of Source-Free Domain Adaptation and Unsupervised Domain Adaptation [52.36436121884317]
We show that Source-Free Domain Adaptation (SFDA) generally outperforms Unsupervised Domain Adaptation (UDA) in real-world scenarios.
SFDA offers advantages in time efficiency, storage requirements, targeted learning objectives, reduced risk of negative transfer, and increased robustness against overfitting.
We propose a novel weight estimation method that effectively integrates available source data into multi-SFDA approaches.
arXiv Detail & Related papers (2024-11-24T13:49:29Z) - Enhancing Training Data Attribution for Large Language Models with Fitting Error Consideration [74.09687562334682]
We introduce a novel training data attribution method called Debias and Denoise Attribution (DDA)
Our method significantly outperforms existing approaches, achieving an averaged AUC of 91.64%.
DDA exhibits strong generality and scalability across various sources and different-scale models like LLaMA2, QWEN2, and Mistral.
arXiv Detail & Related papers (2024-10-02T07:14:26Z) - Understanding and Reducing the Class-Dependent Effects of Data Augmentation with A Two-Player Game Approach [7.05832012052375]
We propose CLAM, a CLAss-dependent Multiplicative-weights method to counteract the unfair effect of data augmentation on classification performance.
Our results show that the performance of learned classifiers is indeed more fairly distributed over classes, with only limited impact on the average accuracy.
arXiv Detail & Related papers (2024-05-31T02:56:43Z) - Federated Learning under Partially Class-Disjoint Data via Manifold Reshaping [64.58402571292723]
We propose a manifold reshaping approach called FedMR to calibrate the feature space of local training.
We conduct extensive experiments on a range of datasets to demonstrate that our FedMR achieves much higher accuracy and better communication efficiency.
arXiv Detail & Related papers (2024-05-29T10:56:13Z) - CLIP the Bias: How Useful is Balancing Data in Multimodal Learning? [72.19502317793133]
We study the effectiveness of data-balancing for mitigating biases in contrastive language-image pretraining (CLIP)
We present a novel algorithm, called Multi-Modal Moment Matching (M4), designed to reduce both representation and association biases.
arXiv Detail & Related papers (2024-03-07T14:43:17Z) - Understanding the Detrimental Class-level Effects of Data Augmentation [63.1733767714073]
achieving optimal average accuracy comes at the cost of significantly hurting individual class accuracy by as much as 20% on ImageNet.
We present a framework for understanding how DA interacts with class-level learning dynamics.
We show that simple class-conditional augmentation strategies improve performance on the negatively affected classes.
arXiv Detail & Related papers (2023-12-07T18:37:43Z) - DualMix: Unleashing the Potential of Data Augmentation for Online
Class-Incremental Learning [14.194817677415065]
We show that augmented samples with lower correlation to the original data are more effective in preventing forgetting.
We propose the Enhanced Mixup (EnMix) method that mixes the augmented samples and their labels simultaneously.
To solve the class imbalance problem, we design an Adaptive Mixup (AdpMix) method to calibrate the decision boundaries.
arXiv Detail & Related papers (2023-03-14T12:55:42Z) - A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability [29.40977854491399]
Data augmentation (DA) is indispensable in modern machine learning and deep neural networks.
This survey comprehensively reviews a crucial subset of DA techniques, namely Mix-based Data Augmentation (MixDA)
In contrast to traditional DA approaches that operate on single samples or entire datasets, MixDA stands out due to its effectiveness, simplicity, flexibility, computational efficiency, theoretical foundation, and broad applicability.
arXiv Detail & Related papers (2022-12-21T09:58:14Z) - FIXED: Frustratingly Easy Domain Generalization with Mixup [53.782029033068675]
Domain generalization (DG) aims to learn a generalizable model from multiple training domains such that it can perform well on unseen target domains.
A popular strategy is to augment training data to benefit generalization through methods such as Mixupcitezhang 2018mixup.
We propose a simple yet effective enhancement for Mixup-based DG, namely domain-invariant Feature mIXup (FIX)
Our approach significantly outperforms nine state-of-the-art related methods, beating the best performing baseline by 6.5% on average in terms of test accuracy.
arXiv Detail & Related papers (2022-11-07T09:38:34Z) - Combined Cleaning and Resampling Algorithm for Multi-Class Imbalanced
Data with Label Noise [11.868507571027626]
In this paper, we propose a novel oversampling technique, a Multi-Class Combined Cleaning and Resampling algorithm.
The proposed method utilizes an energy-based approach to modeling the regions suitable for oversampling, less affected by small disjuncts and outliers than SMOTE.
It combines it with a simultaneous cleaning operation, the aim of which is to reduce the effect of overlapping class distributions on the performance of the learning algorithms.
arXiv Detail & Related papers (2020-04-07T13:59:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.