AutoSimulate: (Quickly) Learning Synthetic Data Generation
- URL: http://arxiv.org/abs/2008.08424v1
- Date: Sun, 16 Aug 2020 11:36:11 GMT
- Title: AutoSimulate: (Quickly) Learning Synthetic Data Generation
- Authors: Harkirat Singh Behl, At{\i}l{\i}m G\"une\c{s} Baydin, Ran Gal, Philip
H.S. Torr, Vibhav Vineet
- Abstract summary: We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective.
We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
- Score: 70.82315853981838
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Simulation is increasingly being used for generating large labelled datasets
in many machine learning problems. Recent methods have focused on adjusting
simulator parameters with the goal of maximising accuracy on a validation task,
usually relying on REINFORCE-like gradient estimators. However these approaches
are very expensive as they treat the entire data generation, model training,
and validation pipeline as a black-box and require multiple costly objective
evaluations at each iteration. We propose an efficient alternative for optimal
synthetic data generation, based on a novel differentiable approximation of the
objective. This allows us to optimize the simulator, which may be
non-differentiable, requiring only one objective evaluation at each iteration
with a little overhead. We demonstrate on a state-of-the-art photorealistic
renderer that the proposed method finds the optimal data distribution faster
(up to $50\times$), with significantly reduced training data generation (up to
$30\times$) and better accuracy ($+8.7\%$) on real-world test datasets than
previous methods.
Related papers
- FLOPS: Forward Learning with OPtimal Sampling [1.694989793927645]
gradient-based computation methods have recently gained focus for learning with only forward passes, also referred to as queries.
Conventional forward learning consumes enormous queries on each data point for accurate gradient estimation through Monte Carlo sampling.
We propose to allocate the optimal number of queries over each data in one batch during training to achieve a good balance between estimation accuracy and computational efficiency.
arXiv Detail & Related papers (2024-10-08T12:16:12Z) - Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers.
Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy.
We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z) - Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching.
Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z) - Non-iterative optimization of pseudo-labeling thresholds for training
object detection models from multiple datasets [2.1485350418225244]
We propose a non-iterative method to optimize pseudo-labeling thresholds for learning object detection from a collection of low-cost datasets.
We experimentally demonstrate that our proposed method achieves an mAP comparable to that of grid search on the COCO and VOC datasets.
arXiv Detail & Related papers (2022-10-19T00:31:34Z) - Easy Differentially Private Linear Regression [16.325734286930764]
We study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models.
We find that this algorithm obtains strong empirical performance in the data-rich setting.
arXiv Detail & Related papers (2022-08-15T17:42:27Z) - Efficient Learning of Accurate Surrogates for Simulations of Complex Systems [0.0]
We introduce an online learning method empowered by sampling-driven sampling.
It ensures that all turning points on the model response surface are included in the training data.
We apply our method to simulations of nuclear matter to demonstrate that highly accurate surrogates can be reliably auto-generated.
arXiv Detail & Related papers (2022-07-11T20:51:11Z) - HyperImpute: Generalized Iterative Imputation with Automatic Model
Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models.
We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z) - Towards Optimal Strategies for Training Self-Driving Perception Models
in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone.
Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator.
We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z) - How to distribute data across tasks for meta-learning? [59.608652082495624]
We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets.
Our results suggest a simple and efficient procedure for data collection.
arXiv Detail & Related papers (2021-03-15T15:38:47Z) - Continuous Optimization Benchmarks by Simulation [0.0]
Benchmark experiments are required to test, compare, tune, and understand optimization algorithms.
Data from previous evaluations can be used to train surrogate models which are then used for benchmarking.
We show that the spectral simulation method enables simulation for continuous optimization problems.
arXiv Detail & Related papers (2020-08-14T08:50:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.