Genetic Learning for Designing Sim-to-Real Data Augmentations
- URL: http://arxiv.org/abs/2403.06786v1
- Date: Mon, 11 Mar 2024 15:00:56 GMT
- Title: Genetic Learning for Designing Sim-to-Real Data Augmentations
- Authors: Bram Vanherle, Nick Michiels, Frank Van Reeth
- Abstract summary: Data augmentations are useful in closing the sim-to-real domain gap when training on synthetic data.
Many image augmentation techniques exist, parametrized by different settings, such as strength and probability.
This paper presents two different interpretable metrics that can be combined to predict how well a certain augmentation policy will work for a specific sim-to-real setting.
- Score: 1.03590082373586
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Data augmentations are useful in closing the sim-to-real domain gap when
training on synthetic data. This is because they widen the training data
distribution, thus encouraging the model to generalize better to other domains.
Many image augmentation techniques exist, parametrized by different settings,
such as strength and probability. This leads to a large space of different
possible augmentation policies. Some policies work better than others for
overcoming the sim-to-real gap for specific datasets, and it is unclear why.
This paper presents two different interpretable metrics that can be combined to
predict how well a certain augmentation policy will work for a specific
sim-to-real setting, focusing on object detection. We validate our metrics by
training many models with different augmentation policies and showing a strong
correlation with performance on real data. Additionally, we introduce
GeneticAugment, a genetic programming method that can leverage these metrics to
automatically design an augmentation policy for a specific dataset without
needing to train a model.
Related papers
- How Does Data Diversity Shape the Weight Landscape of Neural Networks? [2.89287673224661]
We investigate the impact of dropout, weight decay, and noise augmentation on the parameter space of neural networks.
We observe that diverse data influences the weight landscape in a similar fashion as dropout.
We conclude that synthetic data can bring more diversity into real input data, resulting in a better performance on out-of-distribution test instances.
arXiv Detail & Related papers (2024-10-18T16:57:05Z) - DualAug: Exploiting Additional Heavy Augmentation with OOD Data
Rejection [77.6648187359111]
We propose a novel data augmentation method, named textbfDualAug, to keep the augmentation in distribution as much as possible at a reasonable time and computational cost.
Experiments on supervised image classification benchmarks show that DualAug improve various automated data augmentation method.
arXiv Detail & Related papers (2023-10-12T08:55:10Z) - Self-supervised Representation Learning From Random Data Projectors [13.764897214965766]
This paper presents an SSRL approach that can be applied to any data modality and network architecture.
We show that high-quality data representations can be learned by reconstructing random data projections.
arXiv Detail & Related papers (2023-10-11T18:00:01Z) - When to Learn What: Model-Adaptive Data Augmentation Curriculum [32.99634881669643]
We propose Model Adaptive Data Augmentation (MADAug) to jointly train an augmentation policy network to teach the model when to learn what.
Unlike previous work, MADAug selects augmentation operators for each input image by a model-adaptive policy varying between training stages, producing a data augmentation curriculum optimized for better generalization.
arXiv Detail & Related papers (2023-09-09T10:35:27Z) - Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase.
We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output.
Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z) - Phased Data Augmentation for Training a Likelihood-Based Generative Model with Limited Data [0.0]
Generative models excel in creating realistic images, yet their dependency on extensive datasets for training presents significant challenges.
Current data-efficient methods largely focus on GAN architectures, leaving a gap in training other types of generative models.
"phased data augmentation" is a novel technique that addresses this gap by optimizing training in limited data scenarios without altering the inherent data distribution.
arXiv Detail & Related papers (2023-05-22T03:38:59Z) - Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks.
Data augmentation induces these symmetries during training by applying multiple transformations to the input data.
This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z) - DNA: Dynamic Network Augmentation [0.0]
We introduce Dynamic Network Augmentation (DNA), which learns input-conditional augmentation policies.
Our model allows for dynamic augmentation policies and performs well on data with geometric transformations conditional on input features.
arXiv Detail & Related papers (2021-12-17T01:43:56Z) - Virtual Data Augmentation: A Robust and General Framework for
Fine-tuning Pre-trained Models [51.46732511844122]
Powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks.
We present Virtual Data Augmentation (VDA), a general framework for robustly fine-tuning PLMs.
Our approach is able to improve the robustness of PLMs and alleviate the performance degradation under adversarial attacks.
arXiv Detail & Related papers (2021-09-13T09:15:28Z) - CADDA: Class-wise Automatic Differentiable Data Augmentation for EEG
Signals [92.60744099084157]
We propose differentiable data augmentation amenable to gradient-based learning.
We demonstrate the relevance of our approach on the clinically relevant sleep staging classification task.
arXiv Detail & Related papers (2021-06-25T15:28:48Z) - Learning Representational Invariances for Data-Efficient Action
Recognition [52.23716087656834]
We show that our data augmentation strategy leads to promising performance on the Kinetics-100, UCF-101, and HMDB-51 datasets.
We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.
arXiv Detail & Related papers (2021-03-30T17:59:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.