Related papers: Genetic Learning for Designing Sim-to-Real Data Augmentations

Genetic Learning for Designing Sim-to-Real Data Augmentations

URL: http://arxiv.org/abs/2403.06786v1
Date: Mon, 11 Mar 2024 15:00:56 GMT
Title: Genetic Learning for Designing Sim-to-Real Data Augmentations
Authors: Bram Vanherle, Nick Michiels, Frank Van Reeth
Abstract summary: Data augmentations are useful in closing the sim-to-real domain gap when training on synthetic data. Many image augmentation techniques exist, parametrized by different settings, such as strength and probability. This paper presents two different interpretable metrics that can be combined to predict how well a certain augmentation policy will work for a specific sim-to-real setting.
Score: 1.03590082373586
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Data augmentations are useful in closing the sim-to-real domain gap when training on synthetic data. This is because they widen the training data distribution, thus encouraging the model to generalize better to other domains. Many image augmentation techniques exist, parametrized by different settings, such as strength and probability. This leads to a large space of different possible augmentation policies. Some policies work better than others for overcoming the sim-to-real gap for specific datasets, and it is unclear why. This paper presents two different interpretable metrics that can be combined to predict how well a certain augmentation policy will work for a specific sim-to-real setting, focusing on object detection. We validate our metrics by training many models with different augmentation policies and showing a strong correlation with performance on real data. Additionally, we introduce GeneticAugment, a genetic programming method that can leverage these metrics to automatically design an augmentation policy for a specific dataset without needing to train a model.

Related papers

Not All Clients Are Equal: Personalized Federated Learning on Heterogeneous Multi-Modal Clients [52.14230635007546]
Foundation models have shown remarkable capabilities across diverse multi-modal tasks, but their centralized training raises privacy concerns and induces high transmission costs.<n>For the growing demand for personalizing AI models for different user purposes, personalized federated learning (PFL) has emerged.<n>PFL allows each client to leverage the knowledge of other clients for further adaptation to individual user preferences, again without the need to share data.
arXiv Detail & Related papers (2025-05-20T09:17:07Z)
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels [77.48472034791213]
We introduce DataMIL, a policy-driven data selection framework built on the datamodels paradigm.<n>Unlike standard practices that filter data using human notions of quality, DataMIL directly optimize data selection for task success.<n>We validate our approach on a suite of more than 60 simulation and real-world manipulation tasks.
arXiv Detail & Related papers (2025-05-14T17:55:10Z)
Empirical Analysis of Sim-and-Real Cotraining Of Diffusion Policies For Planar Pushing from Pixels [18.858676073824515]
Cotraining with demonstration data generated both in simulation and on real hardware has emerged as a powerful recipe to overcome the sim2real gap. Performance gains scale with simulated data, but eventually plateau; real-world data increases this performance ceiling. Perhaps surprisingly, having some visual domain gap actually helps the cotrained policy -- binary probes reveal that high-performing policies learn to distinguish simulated domains from real.
arXiv Detail & Related papers (2025-03-28T17:25:57Z)
How Does Data Diversity Shape the Weight Landscape of Neural Networks? [2.89287673224661]
We investigate the impact of dropout, weight decay, and noise augmentation on the parameter space of neural networks. We observe that diverse data influences the weight landscape in a similar fashion as dropout. We conclude that synthetic data can bring more diversity into real input data, resulting in a better performance on out-of-distribution test instances.
arXiv Detail & Related papers (2024-10-18T16:57:05Z)
DualAug: Exploiting Additional Heavy Augmentation with OOD Data Rejection [77.6648187359111]
We propose a novel data augmentation method, named textbfDualAug, to keep the augmentation in distribution as much as possible at a reasonable time and computational cost. Experiments on supervised image classification benchmarks show that DualAug improve various automated data augmentation method.
arXiv Detail & Related papers (2023-10-12T08:55:10Z)
Self-supervised Representation Learning From Random Data Projectors [13.764897214965766]
This paper presents an SSRL approach that can be applied to any data modality and network architecture. We show that high-quality data representations can be learned by reconstructing random data projections.
arXiv Detail & Related papers (2023-10-11T18:00:01Z)
When to Learn What: Model-Adaptive Data Augmentation Curriculum [32.99634881669643]
We propose Model Adaptive Data Augmentation (MADAug) to jointly train an augmentation policy network to teach the model when to learn what. Unlike previous work, MADAug selects augmentation operators for each input image by a model-adaptive policy varying between training stages, producing a data augmentation curriculum optimized for better generalization.
arXiv Detail & Related papers (2023-09-09T10:35:27Z)
Scaling Laws Do Not Scale [54.72120385955072]
Recent work has argued that as the size of a dataset increases, the performance of a model trained on that dataset will increase. We argue that this scaling law relationship depends on metrics used to measure performance that may not correspond with how different groups of people perceive the quality of models' output. Different communities may also have values in tension with each other, leading to difficult, potentially irreconcilable choices about metrics used for model evaluations.
arXiv Detail & Related papers (2023-07-05T15:32:21Z)
Phased Data Augmentation for Training a Likelihood-Based Generative Model with Limited Data [0.0]
Generative models excel in creating realistic images, yet their dependency on extensive datasets for training presents significant challenges. Current data-efficient methods largely focus on GAN architectures, leaving a gap in training other types of generative models. "phased data augmentation" is a novel technique that addresses this gap by optimizing training in limited data scenarios without altering the inherent data distribution.
arXiv Detail & Related papers (2023-05-22T03:38:59Z)
Automatic Data Augmentation via Invariance-Constrained Learning [94.27081585149836]
Underlying data structures are often exploited to improve the solution of learning tasks. Data augmentation induces these symmetries during training by applying multiple transformations to the input data. This work tackles these issues by automatically adapting the data augmentation while solving the learning task.
arXiv Detail & Related papers (2022-09-29T18:11:01Z)
DNA: Dynamic Network Augmentation [0.0]
We introduce Dynamic Network Augmentation (DNA), which learns input-conditional augmentation policies. Our model allows for dynamic augmentation policies and performs well on data with geometric transformations conditional on input features.
arXiv Detail & Related papers (2021-12-17T01:43:56Z)
Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models [51.46732511844122]
Powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks. We present Virtual Data Augmentation (VDA), a general framework for robustly fine-tuning PLMs. Our approach is able to improve the robustness of PLMs and alleviate the performance degradation under adversarial attacks.
arXiv Detail & Related papers (2021-09-13T09:15:28Z)
CADDA: Class-wise Automatic Differentiable Data Augmentation for EEG Signals [92.60744099084157]
We propose differentiable data augmentation amenable to gradient-based learning. We demonstrate the relevance of our approach on the clinically relevant sleep staging classification task.
arXiv Detail & Related papers (2021-06-25T15:28:48Z)
Learning Representational Invariances for Data-Efficient Action Recognition [52.23716087656834]
We show that our data augmentation strategy leads to promising performance on the Kinetics-100, UCF-101, and HMDB-51 datasets. We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.
arXiv Detail & Related papers (2021-03-30T17:59:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.