Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data
Generation
- URL: http://arxiv.org/abs/2008.09092v1
- Date: Thu, 20 Aug 2020 17:28:45 GMT
- Title: Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data
Generation
- Authors: Jeevan Devaranjan, Amlan Kar, Sanja Fidler
- Abstract summary: In Meta-Sim2, we aim to learn the scene structure in addition to parameters, which is a challenging problem due to its discrete nature.
We use Reinforcement Learning to train our model, and design a feature space divergence between our synthesized and target images that is key to successful training.
We also show that this leads to downstream improvement in the performance of an object detector trained on our generated dataset as opposed to other baseline simulation methods.
- Score: 88.04759848307687
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Procedural models are being widely used to synthesize scenes for graphics,
gaming, and to create (labeled) synthetic datasets for ML. In order to produce
realistic and diverse scenes, a number of parameters governing the procedural
models have to be carefully tuned by experts. These parameters control both the
structure of scenes being generated (e.g. how many cars in the scene), as well
as parameters which place objects in valid configurations. Meta-Sim aimed at
automatically tuning parameters given a target collection of real images in an
unsupervised way. In Meta-Sim2, we aim to learn the scene structure in addition
to parameters, which is a challenging problem due to its discrete nature.
Meta-Sim2 proceeds by learning to sequentially sample rule expansions from a
given probabilistic scene grammar. Due to the discrete nature of the problem,
we use Reinforcement Learning to train our model, and design a feature space
divergence between our synthesized and target images that is key to successful
training. Experiments on a real driving dataset show that, without any
supervision, we can successfully learn to generate data that captures discrete
structural statistics of objects, such as their frequency, in real images. We
also show that this leads to downstream improvement in the performance of an
object detector trained on our generated dataset as opposed to other baseline
simulation methods. Project page:
https://nv-tlabs.github.io/meta-sim-structure/.
Related papers
- Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation [16.69742672616517]
We introduce an innovative structured light simulation system, generating both RGB and physically realistic depth images.
We create an RGBD dataset tailored for robotic industrial grasping scenarios.
By reducing the sim2real gap and enhancing deep learning training, we facilitate the application of deep learning models in industrial settings.
arXiv Detail & Related papers (2024-07-17T09:57:14Z) - URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images [39.0780707100513]
We present an integrated end-to-end pipeline that generates simulation scenes complete with articulated kinematic and dynamic structures from real-world images.
In doing so, our work provides both a pipeline for large-scale generation of simulation environments and an integrated system for training robust robotic control policies.
arXiv Detail & Related papers (2024-05-19T20:01:29Z) - Learning from synthetic data generated with GRADE [0.6982738885923204]
We present a framework for generating realistic animated dynamic environments (GRADE) for robotics research.
GRADE supports full simulation control, ROS integration, realistic physics, while being in an engine that produces high visual fidelity images and ground truth data.
We show that, even training using only synthetic data, can generalize well to real-world images in the same application domain.
arXiv Detail & Related papers (2023-05-07T14:13:04Z) - TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual
Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets.
We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z) - Towards 3D Scene Understanding by Referring Synthetic Models [65.74211112607315]
Methods typically alleviate on-extensive annotations on real scene scans.
We explore how synthetic models rely on real scene categories of synthetic features to a unified feature space.
Experiments show that our method achieves the average mAP of 46.08% on the ScanNet S3DIS dataset and 55.49% by learning datasets.
arXiv Detail & Related papers (2022-03-20T13:06:15Z) - Learning Multi-Object Dynamics with Compositional Neural Radiance Fields [63.424469458529906]
We present a method to learn compositional predictive models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks.
NeRFs have become a popular choice for representing scenes due to their strong 3D prior.
For planning, we utilize RRTs in the learned latent space, where we can exploit our model and the implicit object encoder to make sampling the latent space informative and more efficient.
arXiv Detail & Related papers (2022-02-24T01:31:29Z) - Task2Sim : Towards Effective Pre-training and Transfer from Synthetic
Data [74.66568380558172]
We study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks.
We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters.
It learns this mapping by training to find the set of best parameters on a set of "seen" tasks.
Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot.
arXiv Detail & Related papers (2021-11-30T19:25:27Z) - RELATE: Physically Plausible Multi-Object Scene Synthesis Using
Structured Latent Spaces [77.07767833443256]
We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects.
In contrast to state-of-the-art methods in object-centric generative modeling, RELATE also extends naturally to dynamic scenes and generates videos of high visual fidelity.
arXiv Detail & Related papers (2020-07-02T17:27:27Z) - Learning to simulate complex scenes [18.51564016785853]
This paper explores content adaptation in the context of semantic segmentation.
We propose a scalable discretization-and-relaxation (SDR) approach to optimize the attribute values and obtain a training set of similar content to real-world data.
Experiment shows our system can generate reasonable and useful scenes, from which we obtain promising real-world segmentation accuracy.
arXiv Detail & Related papers (2020-06-25T17:51:34Z) - Stillleben: Realistic Scene Synthesis for Deep Learning in Robotics [33.30312206728974]
We describe a synthesis pipeline capable of producing training data for cluttered scene perception tasks.
Our approach arranges object meshes in physically realistic, dense scenes using physics simulation.
Our pipeline can be run online during training of a deep neural network.
arXiv Detail & Related papers (2020-05-12T10:11:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.