Learning Interactive Real-World Simulators
- URL: http://arxiv.org/abs/2310.06114v2
- Date: Sat, 13 Jan 2024 00:42:24 GMT
- Title: Learning Interactive Real-World Simulators
- Authors: Mengjiao Yang, Yilun Du, Kamyar Ghasemipour, Jonathan Tompson, Leslie
Kaelbling, Dale Schuurmans, Pieter Abbeel
- Abstract summary: We explore the possibility of learning a universal simulator of real-world interaction through generative modeling.
We use the simulator to train both high-level vision-language policies and low-level reinforcement learning policies.
Video captioning models can benefit from training with simulated experience, opening up even wider applications.
- Score: 107.12907352474005
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generative models trained on internet data have revolutionized how text,
image, and video content can be created. Perhaps the next milestone for
generative models is to simulate realistic experience in response to actions
taken by humans, robots, and other interactive agents. Applications of a
real-world simulator range from controllable content creation in games and
movies, to training embodied agents purely in simulation that can be directly
deployed in the real world. We explore the possibility of learning a universal
simulator of real-world interaction through generative modeling. We first make
the important observation that natural datasets available for learning a
real-world simulator are often rich along different dimensions (e.g., abundant
objects in image data, densely sampled actions in robotics data, and diverse
movements in navigation data). With careful orchestration of diverse datasets,
each providing a different aspect of the overall experience, we can simulate
the visual outcome of both high-level instructions such as ``open the drawer''
and low-level controls such as "move by x, y" from otherwise static scenes and
objects. We use the simulator to train both high-level vision-language policies
and low-level reinforcement learning policies, each of which can be deployed in
the real world in zero shot after training purely in simulation. We also show
that other types of intelligence such as video captioning models can benefit
from training with simulated experience, opening up even wider applications.
Video demos can be found at https://universal-simulator.github.io.
Related papers
- Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks [93.38375271826202]
We present a method to improve generalization and robustness to distribution shifts in sim-to-real visual quadrotor navigation tasks.
We first build a simulator by integrating Gaussian splatting with quadrotor flight dynamics, and then, train robust navigation policies using Liquid neural networks.
In this way, we obtain a full-stack imitation learning protocol that combines advances in 3D Gaussian splatting radiance field rendering, programming of expert demonstration training data, and the task understanding capabilities of Liquid networks.
arXiv Detail & Related papers (2024-06-21T13:48:37Z) - RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots [25.650235551519952]
We present RoboCasa, a large-scale simulation framework for training generalist robots in everyday environments.
We provide thousands of 3D assets across over 150 object categories and dozens of interactable furniture and appliances.
Our experiments show a clear scaling trend in using synthetically generated robot data for large-scale imitation learning.
arXiv Detail & Related papers (2024-06-04T17:41:31Z) - URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images [39.0780707100513]
We present an integrated end-to-end pipeline that generates simulation scenes complete with articulated kinematic and dynamic structures from real-world images.
In doing so, our work provides both a pipeline for large-scale generation of simulation environments and an integrated system for training robust robotic control policies.
arXiv Detail & Related papers (2024-05-19T20:01:29Z) - Scaling Face Interaction Graph Networks to Real World Scenes [12.519862235430153]
We introduce a method which substantially reduces the memory required to run graph-based learned simulators.
We show that our method uses substantially less memory than previous graph-based simulators while retaining their accuracy.
This paves the way for expanding the application of learned simulators to settings where only perceptual information is available at inference time.
arXiv Detail & Related papers (2024-01-22T14:38:25Z) - Sim-to-Real via Sim-to-Seg: End-to-end Off-road Autonomous Driving
Without Real Data [56.49494318285391]
We present Sim2Seg, a re-imagining of RCAN that crosses the visual reality gap for off-road autonomous driving.
This is done by learning to translate randomized simulation images into simulated segmentation and depth maps.
This allows us to train an end-to-end RL policy in simulation, and directly deploy in the real-world.
arXiv Detail & Related papers (2022-10-25T17:50:36Z) - DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to
Reality [64.51295032956118]
We train a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand.
Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups.
arXiv Detail & Related papers (2022-10-25T01:51:36Z) - Towards Optimal Strategies for Training Self-Driving Perception Models
in Simulation [98.51313127382937]
We focus on the use of labels in the synthetic domain alone.
Our approach introduces both a way to learn neural-invariant representations and a theoretically inspired view on how to sample the data from the simulator.
We showcase our approach on the bird's-eye-view vehicle segmentation task with multi-sensor data.
arXiv Detail & Related papers (2021-11-15T18:37:43Z) - DriveGAN: Towards a Controllable High-Quality Neural Simulation [147.6822288981004]
We introduce a novel high-quality neural simulator referred to as DriveGAN.
DriveGAN achieves controllability by disentangling different components without supervision.
We train DriveGAN on multiple datasets, including 160 hours of real-world driving data.
arXiv Detail & Related papers (2021-04-30T15:30:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.