Related papers: SelectAugment: Hierarchical Deterministic Sample Selection for Data Augmentation

SelectAugment: Hierarchical Deterministic Sample Selection for Data Augmentation

URL: http://arxiv.org/abs/2112.02862v1
Date: Mon, 6 Dec 2021 08:38:38 GMT
Title: SelectAugment: Hierarchical Deterministic Sample Selection for Data Augmentation
Authors: Shiqi Lin, Zhizheng Zhang, Xin Li, Wenjun Zeng, Zhibo Chen
Abstract summary: We propose an effective approach, dubbed SelectAugment, to select samples to be augmented in a deterministic and online manner. Specifically, in each batch, we first determine the augmentation ratio, and then decide whether to augment each training sample under this ratio. In this way, the negative effects of the randomness in selecting samples to augment can be effectively alleviated and the effectiveness of DA is improved.
Score: 72.58308581812149
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Data augmentation (DA) has been widely investigated to facilitate model optimization in many tasks. However, in most cases, data augmentation is randomly performed for each training sample with a certain probability, which might incur content destruction and visual ambiguities. To eliminate this, in this paper, we propose an effective approach, dubbed SelectAugment, to select samples to be augmented in a deterministic and online manner based on the sample contents and the network training status. Specifically, in each batch, we first determine the augmentation ratio, and then decide whether to augment each training sample under this ratio. We model this process as a two-step Markov decision process and adopt Hierarchical Reinforcement Learning (HRL) to learn the augmentation policy. In this way, the negative effects of the randomness in selecting samples to augment can be effectively alleviated and the effectiveness of DA is improved. Extensive experiments demonstrate that our proposed SelectAugment can be adapted upon numerous commonly used DA methods, e.g., Mixup, Cutmix, AutoAugment, etc, and improve their performance on multiple benchmark datasets of image classification and fine-grained image recognition.

Related papers

Finding the Sweet Spot: Preference Data Construction for Scaling Preference Optimization [66.67988187816185]
We aim to emphscale up the number of on-policy samples via repeated random sampling to improve alignment performance. Our experiments reveal that this strategy leads to a emphdecline in performance as the sample size increases. We introduce a scalable preference data construction strategy that consistently enhances model performance as the sample scale increases.
arXiv Detail & Related papers (2025-02-24T04:22:57Z)
Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process. vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner. We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z)
Data Pruning via Moving-one-Sample-out [61.45441981346064]
We propose a novel data-pruning approach called moving-one-sample-out (MoSo) MoSo aims to identify and remove the least informative samples from the training set. Experimental results demonstrate that MoSo effectively mitigates severe performance degradation at high pruning ratios.
arXiv Detail & Related papers (2023-10-23T08:00:03Z)
ScoreMix: A Scalable Augmentation Strategy for Training GANs with Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available. We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z)
Towards Automated Imbalanced Learning with Deep Hierarchical Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class. Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class. We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z)
ReSmooth: Detecting and Utilizing OOD Samples when Training with Data Augmentation [57.38418881020046]
Recent DA techniques always meet the need for diversity in augmented training samples. An augmentation strategy that has a high diversity usually introduces out-of-distribution (OOD) augmented samples. We propose ReSmooth, a framework that firstly detects OOD samples in augmented samples and then leverages them.
arXiv Detail & Related papers (2022-05-25T09:29:27Z)
Automatic Data Augmentation Selection and Parametrization in Contrastive Self-Supervised Speech Representation Learning [21.423349835589793]
This work introduces a conditional independance-based method which allows for automatically selecting a suitable distribution on the choice of augmentations and their parametrization from a set of predefined ones. Experiments performed on two different downstream tasks validate the proposed approach showing better results than experimenting without augmentation or with baseline augmentations.
arXiv Detail & Related papers (2022-04-08T16:30:50Z)
When Chosen Wisely, More Data Is What You Need: A Universal Sample-Efficient Strategy For Data Augmentation [19.569164094496955]
We present a universal Data Augmentation (DA) technique, called Glitter, to overcome both issues. Glitter adaptively selects a subset of worst-case samples with maximal loss, analogous to adversarial DA. Our experiments on the GLUE benchmark, SQuAD, and HellaSwag in three widely used training setups reveal that Glitter is substantially faster to train and achieves a competitive performance.
arXiv Detail & Related papers (2022-03-17T15:33:52Z)
Self-paced Data Augmentation for Training Neural Networks [11.554821454921536]
We propose a self-paced augmentation to automatically select suitable samples for data augmentation when training a neural network. The proposed method mitigates the deterioration of generalization performance caused by ineffective data augmentation. Experimental results demonstrate that the proposed SPA can improve the generalization performance, particularly when the number of training samples is small.
arXiv Detail & Related papers (2020-10-29T09:13:18Z)
Reinforced Data Sampling for Model Diversification [15.547681142342846]
This paper proposes a new Reinforced Data Sampling (RDS) method to learn how to sample data adequately. We formulate the optimisation problem of model diversification $delta-div$ in data sampling to maximise learning potentials and optimum allocation by injecting model diversity. Our results suggest that the trainable sampling for model diversification is useful for competition organisers, researchers, or even starters to pursue full potentials of various machine learning tasks.
arXiv Detail & Related papers (2020-06-12T11:46:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.