Related papers: Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation

Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation

URL: http://arxiv.org/abs/2008.09092v1
Date: Thu, 20 Aug 2020 17:28:45 GMT
Title: Meta-Sim2: Unsupervised Learning of Scene Structure for Synthetic Data Generation
Authors: Jeevan Devaranjan, Amlan Kar, Sanja Fidler
Abstract summary: In Meta-Sim2, we aim to learn the scene structure in addition to parameters, which is a challenging problem due to its discrete nature. We use Reinforcement Learning to train our model, and design a feature space divergence between our synthesized and target images that is key to successful training. We also show that this leads to downstream improvement in the performance of an object detector trained on our generated dataset as opposed to other baseline simulation methods.
Score: 88.04759848307687
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Procedural models are being widely used to synthesize scenes for graphics, gaming, and to create (labeled) synthetic datasets for ML. In order to produce realistic and diverse scenes, a number of parameters governing the procedural models have to be carefully tuned by experts. These parameters control both the structure of scenes being generated (e.g. how many cars in the scene), as well as parameters which place objects in valid configurations. Meta-Sim aimed at automatically tuning parameters given a target collection of real images in an unsupervised way. In Meta-Sim2, we aim to learn the scene structure in addition to parameters, which is a challenging problem due to its discrete nature. Meta-Sim2 proceeds by learning to sequentially sample rule expansions from a given probabilistic scene grammar. Due to the discrete nature of the problem, we use Reinforcement Learning to train our model, and design a feature space divergence between our synthesized and target images that is key to successful training. Experiments on a real driving dataset show that, without any supervision, we can successfully learn to generate data that captures discrete structural statistics of objects, such as their frequency, in real images. We also show that this leads to downstream improvement in the performance of an object detector trained on our generated dataset as opposed to other baseline simulation methods. Project page: https://nv-tlabs.github.io/meta-sim-structure/.

Related papers

Cut-and-Splat: Leveraging Gaussian Splatting for Synthetic Data Generation [0.7864304771129751]
We develop a synthetic data pipeline for generating context-aware instance segmentation training data for specific objects. We train a Gaussian Splatting model of the target object and automatically extract the object from the video. We then render the object on a random background image, and monocular depth estimation is employed to place the object in a believable pose.
arXiv Detail & Related papers (2025-04-11T12:04:49Z)
Close the Sim2real Gap via Physically-based Structured Light Synthetic Data Simulation [16.69742672616517]
We introduce an innovative structured light simulation system, generating both RGB and physically realistic depth images. We create an RGBD dataset tailored for robotic industrial grasping scenarios. By reducing the sim2real gap and enhancing deep learning training, we facilitate the application of deep learning models in industrial settings.
arXiv Detail & Related papers (2024-07-17T09:57:14Z)
URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images [39.0780707100513]
We present an integrated end-to-end pipeline that generates simulation scenes complete with articulated kinematic and dynamic structures from real-world images. In doing so, our work provides both a pipeline for large-scale generation of simulation environments and an integrated system for training robust robotic control policies.
arXiv Detail & Related papers (2024-05-19T20:01:29Z)
Learning from synthetic data generated with GRADE [0.6982738885923204]
We present a framework for generating realistic animated dynamic environments (GRADE) for robotics research. GRADE supports full simulation control, ROS integration, realistic physics, while being in an engine that produces high visual fidelity images and ground truth data. We show that, even training using only synthetic data, can generalize well to real-world images in the same application domain.
arXiv Detail & Related papers (2023-05-07T14:13:04Z)
TRoVE: Transforming Road Scene Datasets into Photorealistic Virtual Environments [84.6017003787244]
This work proposes a synthetic data generation pipeline to address the difficulties and domain-gaps present in simulated datasets. We show that using annotations and visual cues from existing datasets, we can facilitate automated multi-modal data generation.
arXiv Detail & Related papers (2022-08-16T20:46:08Z)
Towards 3D Scene Understanding by Referring Synthetic Models [65.74211112607315]
Methods typically alleviate on-extensive annotations on real scene scans. We explore how synthetic models rely on real scene categories of synthetic features to a unified feature space. Experiments show that our method achieves the average mAP of 46.08% on the ScanNet S3DIS dataset and 55.49% by learning datasets.
arXiv Detail & Related papers (2022-03-20T13:06:15Z)
Learning Multi-Object Dynamics with Compositional Neural Radiance Fields [63.424469458529906]
We present a method to learn compositional predictive models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks. NeRFs have become a popular choice for representing scenes due to their strong 3D prior. For planning, we utilize RRTs in the learned latent space, where we can exploit our model and the implicit object encoder to make sampling the latent space informative and more efficient.
arXiv Detail & Related papers (2022-02-24T01:31:29Z)
Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data [74.66568380558172]
We study the transferability of pre-trained models based on synthetic data generated by graphics simulators to downstream tasks. We introduce Task2Sim, a unified model mapping downstream task representations to optimal simulation parameters. It learns this mapping by training to find the set of best parameters on a set of "seen" tasks. Once trained, it can then be used to predict best simulation parameters for novel "unseen" tasks in one shot.
arXiv Detail & Related papers (2021-11-30T19:25:27Z)
RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces [77.07767833443256]
We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects. In contrast to state-of-the-art methods in object-centric generative modeling, RELATE also extends naturally to dynamic scenes and generates videos of high visual fidelity.
arXiv Detail & Related papers (2020-07-02T17:27:27Z)
Learning to simulate complex scenes [18.51564016785853]
This paper explores content adaptation in the context of semantic segmentation. We propose a scalable discretization-and-relaxation (SDR) approach to optimize the attribute values and obtain a training set of similar content to real-world data. Experiment shows our system can generate reasonable and useful scenes, from which we obtain promising real-world segmentation accuracy.
arXiv Detail & Related papers (2020-06-25T17:51:34Z)
Stillleben: Realistic Scene Synthesis for Deep Learning in Robotics [33.30312206728974]
We describe a synthesis pipeline capable of producing training data for cluttered scene perception tasks. Our approach arranges object meshes in physically realistic, dense scenes using physics simulation. Our pipeline can be run online during training of a deep neural network.
arXiv Detail & Related papers (2020-05-12T10:11:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.