Related papers: AutoSimulate: (Quickly) Learning Synthetic Data Generation

AutoSimulate: (Quickly) Learning Synthetic Data Generation

URL: http://arxiv.org/abs/2008.08424v1
Date: Sun, 16 Aug 2020 11:36:11 GMT
Title: AutoSimulate: (Quickly) Learning Synthetic Data Generation
Authors: Harkirat Singh Behl, At{\i}l{\i}m G\"une\c{s} Baydin, Ran Gal, Philip H.S. Torr, Vibhav Vineet
Abstract summary: We propose an efficient alternative for optimal synthetic data generation based on a novel differentiable approximation of the objective. We demonstrate that the proposed method finds the optimal data distribution faster (up to $50times$), with significantly reduced training data generation (up to $30times$) and better accuracy ($+8.7%$) on real-world test datasets than previous methods.
Score: 70.82315853981838
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Simulation is increasingly being used for generating large labelled datasets in many machine learning problems. Recent methods have focused on adjusting simulator parameters with the goal of maximising accuracy on a validation task, usually relying on REINFORCE-like gradient estimators. However these approaches are very expensive as they treat the entire data generation, model training, and validation pipeline as a black-box and require multiple costly objective evaluations at each iteration. We propose an efficient alternative for optimal synthetic data generation, based on a novel differentiable approximation of the objective. This allows us to optimize the simulator, which may be non-differentiable, requiring only one objective evaluation at each iteration with a little overhead. We demonstrate on a state-of-the-art photorealistic renderer that the proposed method finds the optimal data distribution faster (up to $50\times$), with significantly reduced training data generation (up to $30\times$) and better accuracy ($+8.7\%$) on real-world test datasets than previous methods.

Related papers

A hybrid framework for effective and efficient machine unlearning [12.499101994047862]
Machine unlearning (MU) is proposed to remove the imprints of revoked samples from the already trained model parameters. We present a novel hybrid strategy on top of them to achieve an overall success.
arXiv Detail & Related papers (2024-12-19T03:59:26Z)
FLOPS: Forward Learning with OPtimal Sampling [1.694989793927645]
gradient-based computation methods have recently gained focus for learning with only forward passes, also referred to as queries. Conventional forward learning consumes enormous queries on each data point for accurate gradient estimation through Monte Carlo sampling. We propose to allocate the optimal number of queries over each data in one batch during training to achieve a good balance between estimation accuracy and computational efficiency.
arXiv Detail & Related papers (2024-10-08T12:16:12Z)
Autonomous Vehicle Controllers From End-to-End Differentiable Simulation [60.05963742334746]
We propose a differentiable simulator and design an analytic policy gradients (APG) approach to training AV controllers. Our proposed framework brings the differentiable simulator into an end-to-end training loop, where gradients of environment dynamics serve as a useful prior to help the agent learn a more grounded policy. We find significant improvements in performance and robustness to noise in the dynamics, as well as overall more intuitive human-like handling.
arXiv Detail & Related papers (2024-09-12T11:50:06Z)
Online Importance Sampling for Stochastic Gradient Optimization [33.42221341526944]
We propose a practical algorithm that efficiently computes data importance on-the-fly during training. We also introduce a novel metric based on the derivative of the loss w.r.t. the network output, designed for mini-batch importance sampling.
arXiv Detail & Related papers (2023-11-24T13:21:35Z)
Improved Distribution Matching for Dataset Condensation [91.55972945798531]
We propose a novel dataset condensation method based on distribution matching. Our simple yet effective method outperforms most previous optimization-oriented methods with much fewer computational resources.
arXiv Detail & Related papers (2023-07-19T04:07:33Z)
Non-iterative optimization of pseudo-labeling thresholds for training object detection models from multiple datasets [2.1485350418225244]
We propose a non-iterative method to optimize pseudo-labeling thresholds for learning object detection from a collection of low-cost datasets. We experimentally demonstrate that our proposed method achieves an mAP comparable to that of grid search on the COCO and VOC datasets.
arXiv Detail & Related papers (2022-10-19T00:31:34Z)
Easy Differentially Private Linear Regression [16.325734286930764]
We study an algorithm which uses the exponential mechanism to select a model with high Tukey depth from a collection of non-private regression models. We find that this algorithm obtains strong empirical performance in the data-rich setting.
arXiv Detail & Related papers (2022-08-15T17:42:27Z)
Efficient Learning of Accurate Surrogates for Simulations of Complex Systems [0.0]
We introduce an online learning method empowered by sampling-driven sampling. It ensures that all turning points on the model response surface are included in the training data. We apply our method to simulations of nuclear matter to demonstrate that highly accurate surrogates can be reliably auto-generated.
arXiv Detail & Related papers (2022-07-11T20:51:11Z)
HyperImpute: Generalized Iterative Imputation with Automatic Model Selection [77.86861638371926]
We propose a generalized iterative imputation framework for adaptively and automatically configuring column-wise models. We provide a concrete implementation with out-of-the-box learners, simulators, and interfaces.
arXiv Detail & Related papers (2022-06-15T19:10:35Z)
Towards Optimal Strategies for Training Self-Driving Perception Models in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone. Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator. We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z)
How to distribute data across tasks for meta-learning? [59.608652082495624]
We show that the optimal number of data points per task depends on the budget, but it converges to a unique constant value for large budgets. Our results suggest a simple and efficient procedure for data collection.
arXiv Detail & Related papers (2021-03-15T15:38:47Z)
Continuous Optimization Benchmarks by Simulation [0.0]
Benchmark experiments are required to test, compare, tune, and understand optimization algorithms. Data from previous evaluations can be used to train surrogate models which are then used for benchmarking. We show that the spectral simulation method enables simulation for continuous optimization problems.
arXiv Detail & Related papers (2020-08-14T08:50:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.