SelectAugment: Hierarchical Deterministic Sample Selection for Data
Augmentation
- URL: http://arxiv.org/abs/2112.02862v1
- Date: Mon, 6 Dec 2021 08:38:38 GMT
- Title: SelectAugment: Hierarchical Deterministic Sample Selection for Data
Augmentation
- Authors: Shiqi Lin, Zhizheng Zhang, Xin Li, Wenjun Zeng, Zhibo Chen
- Abstract summary: We propose an effective approach, dubbed SelectAugment, to select samples to be augmented in a deterministic and online manner.
Specifically, in each batch, we first determine the augmentation ratio, and then decide whether to augment each training sample under this ratio.
In this way, the negative effects of the randomness in selecting samples to augment can be effectively alleviated and the effectiveness of DA is improved.
- Score: 72.58308581812149
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Data augmentation (DA) has been widely investigated to facilitate model
optimization in many tasks. However, in most cases, data augmentation is
randomly performed for each training sample with a certain probability, which
might incur content destruction and visual ambiguities. To eliminate this, in
this paper, we propose an effective approach, dubbed SelectAugment, to select
samples to be augmented in a deterministic and online manner based on the
sample contents and the network training status. Specifically, in each batch,
we first determine the augmentation ratio, and then decide whether to augment
each training sample under this ratio. We model this process as a two-step
Markov decision process and adopt Hierarchical Reinforcement Learning (HRL) to
learn the augmentation policy. In this way, the negative effects of the
randomness in selecting samples to augment can be effectively alleviated and
the effectiveness of DA is improved. Extensive experiments demonstrate that our
proposed SelectAugment can be adapted upon numerous commonly used DA methods,
e.g., Mixup, Cutmix, AutoAugment, etc, and improve their performance on
multiple benchmark datasets of image classification and fine-grained image
recognition.
Related papers
- Diversified Batch Selection for Training Acceleration [68.67164304377732]
A prevalent research line, known as online batch selection, explores selecting informative subsets during the training process.
vanilla reference-model-free methods involve independently scoring and selecting data in a sample-wise manner.
We propose Diversified Batch Selection (DivBS), which is reference-model-free and can efficiently select diverse and representative samples.
arXiv Detail & Related papers (2024-06-07T12:12:20Z) - Data Pruning via Moving-one-Sample-out [61.45441981346064]
We propose a novel data-pruning approach called moving-one-sample-out (MoSo)
MoSo aims to identify and remove the least informative samples from the training set.
Experimental results demonstrate that MoSo effectively mitigates severe performance degradation at high pruning ratios.
arXiv Detail & Related papers (2023-10-23T08:00:03Z) - ScoreMix: A Scalable Augmentation Strategy for Training GANs with
Limited Data [93.06336507035486]
Generative Adversarial Networks (GANs) typically suffer from overfitting when limited training data is available.
We present ScoreMix, a novel and scalable data augmentation approach for various image synthesis tasks.
arXiv Detail & Related papers (2022-10-27T02:55:15Z) - Towards Automated Imbalanced Learning with Deep Hierarchical
Reinforcement Learning [57.163525407022966]
Imbalanced learning is a fundamental challenge in data mining, where there is a disproportionate ratio of training samples in each class.
Over-sampling is an effective technique to tackle imbalanced learning through generating synthetic samples for the minority class.
We propose AutoSMOTE, an automated over-sampling algorithm that can jointly optimize different levels of decisions.
arXiv Detail & Related papers (2022-08-26T04:28:01Z) - ReSmooth: Detecting and Utilizing OOD Samples when Training with Data
Augmentation [57.38418881020046]
Recent DA techniques always meet the need for diversity in augmented training samples.
An augmentation strategy that has a high diversity usually introduces out-of-distribution (OOD) augmented samples.
We propose ReSmooth, a framework that firstly detects OOD samples in augmented samples and then leverages them.
arXiv Detail & Related papers (2022-05-25T09:29:27Z) - Automatic Data Augmentation Selection and Parametrization in Contrastive
Self-Supervised Speech Representation Learning [21.423349835589793]
This work introduces a conditional independance-based method which allows for automatically selecting a suitable distribution on the choice of augmentations and their parametrization from a set of predefined ones.
Experiments performed on two different downstream tasks validate the proposed approach showing better results than experimenting without augmentation or with baseline augmentations.
arXiv Detail & Related papers (2022-04-08T16:30:50Z) - When Chosen Wisely, More Data Is What You Need: A Universal
Sample-Efficient Strategy For Data Augmentation [19.569164094496955]
We present a universal Data Augmentation (DA) technique, called Glitter, to overcome both issues.
Glitter adaptively selects a subset of worst-case samples with maximal loss, analogous to adversarial DA.
Our experiments on the GLUE benchmark, SQuAD, and HellaSwag in three widely used training setups reveal that Glitter is substantially faster to train and achieves a competitive performance.
arXiv Detail & Related papers (2022-03-17T15:33:52Z) - Self-paced Data Augmentation for Training Neural Networks [11.554821454921536]
We propose a self-paced augmentation to automatically select suitable samples for data augmentation when training a neural network.
The proposed method mitigates the deterioration of generalization performance caused by ineffective data augmentation.
Experimental results demonstrate that the proposed SPA can improve the generalization performance, particularly when the number of training samples is small.
arXiv Detail & Related papers (2020-10-29T09:13:18Z) - Reinforced Data Sampling for Model Diversification [15.547681142342846]
This paper proposes a new Reinforced Data Sampling (RDS) method to learn how to sample data adequately.
We formulate the optimisation problem of model diversification $delta-div$ in data sampling to maximise learning potentials and optimum allocation by injecting model diversity.
Our results suggest that the trainable sampling for model diversification is useful for competition organisers, researchers, or even starters to pursue full potentials of various machine learning tasks.
arXiv Detail & Related papers (2020-06-12T11:46:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.