MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real
- URL: http://arxiv.org/abs/2507.02864v1
- Date: Thu, 03 Jul 2025 17:59:58 GMT
- Title: MultiGen: Using Multimodal Generation in Simulation to Learn Multimodal Policies in Real
- Authors: Renhao Wang, Haoran Geng, Tingle Li, Feishi Wang, Gopala Anumanchipalli, Philipp Wu, Trevor Darrell, Boyi Li, Pieter Abbeel, Jitendra Malik, Alexei A. Efros,
- Abstract summary: MultiGen is a framework that integrates large-scale generative models into traditional physics simulators.<n>We demonstrate effective zero-shot transfer to real-world pouring with novel containers and liquids.
- Score: 128.7629907902049
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Robots must integrate multiple sensory modalities to act effectively in the real world. Yet, learning such multimodal policies at scale remains challenging. Simulation offers a viable solution, but while vision has benefited from high-fidelity simulators, other modalities (e.g. sound) can be notoriously difficult to simulate. As a result, sim-to-real transfer has succeeded primarily in vision-based tasks, with multimodal transfer still largely unrealized. In this work, we tackle these challenges by introducing MultiGen, a framework that integrates large-scale generative models into traditional physics simulators, enabling multisensory simulation. We showcase our framework on the dynamic task of robot pouring, which inherently relies on multimodal feedback. By synthesizing realistic audio conditioned on simulation video, our method enables training on rich audiovisual trajectories -- without any real robot data. We demonstrate effective zero-shot transfer to real-world pouring with novel containers and liquids, highlighting the potential of generative modeling to both simulate hard-to-model modalities and close the multimodal sim-to-real gap.
Related papers
- RoboPearls: Editable Video Simulation for Robot Manipulation [81.18434338506621]
RoboPearls is an editable video simulation framework for robotic manipulation.<n>Built on 3D Gaussian Splatting (3DGS), RoboPearls enables the construction of photo-realistic, view-consistent simulations.<n>We conduct extensive experiments on multiple datasets and scenes, including RLBench, COLOSSEUM, Ego4D, Open X-Embodiment, and a real-world robot.
arXiv Detail & Related papers (2025-06-28T05:03:31Z) - From Abstraction to Reality: DARPA's Vision for Robust Sim-to-Real Autonomy [6.402441477393285]
TIAMAT aims to address rapid and robust transfer of autonomy technologies across dynamic and complex environments.<n>Current methods for simulation-to-reality (sim-to-real) transfer often rely on high-fidelity simulations.<n>TIAMAT's approaches aim to achieve abstract-to-real transfer for effective and rapid real-world adaptation.
arXiv Detail & Related papers (2025-03-14T02:06:10Z) - Pre-Trained Video Generative Models as World Simulators [59.546627730477454]
We propose Dynamic World Simulation (DWS) to transform pre-trained video generative models into controllable world simulators.<n>To achieve precise alignment between conditioned actions and generated visual changes, we introduce a lightweight, universal action-conditioned module.<n> Experiments demonstrate that DWS can be versatilely applied to both diffusion and autoregressive transformer models.
arXiv Detail & Related papers (2025-02-10T14:49:09Z) - Learning Quadruped Locomotion Using Differentiable Simulation [31.80380408663424]
Differentiable simulation promises fast convergence and stable training.
This work proposes a new differentiable simulation framework to overcome these challenges.
Our framework enables learning quadruped walking in simulation in minutes without parallelization.
arXiv Detail & Related papers (2024-03-21T22:18:59Z) - Waymax: An Accelerated, Data-Driven Simulator for Large-Scale Autonomous
Driving Research [76.93956925360638]
Waymax is a new data-driven simulator for autonomous driving in multi-agent scenes.
It runs entirely on hardware accelerators such as TPUs/GPUs and supports in-graph simulation for training.
We benchmark a suite of popular imitation and reinforcement learning algorithms with ablation studies on different design decisions.
arXiv Detail & Related papers (2023-10-12T20:49:15Z) - Residual Physics Learning and System Identification for Sim-to-real
Transfer of Policies on Buoyancy Assisted Legged Robots [14.760426243769308]
In this work, we demonstrate robust sim-to-real transfer of control policies on the BALLU robots via system identification.
Rather than relying on standard supervised learning formulations, we utilize deep reinforcement learning to train an external force policy.
We analyze the improved simulation fidelity by comparing the simulation trajectories against the real-world ones.
arXiv Detail & Related papers (2023-03-16T18:49:05Z) - DeXtreme: Transfer of Agile In-hand Manipulation from Simulation to
Reality [64.51295032956118]
We train a policy that can perform robust dexterous manipulation on an anthropomorphic robot hand.
Our work reaffirms the possibilities of sim-to-real transfer for dexterous manipulation in diverse kinds of hardware and simulator setups.
arXiv Detail & Related papers (2022-10-25T01:51:36Z) - TrafficSim: Learning to Simulate Realistic Multi-Agent Behaviors [74.67698916175614]
We propose TrafficSim, a multi-agent behavior model for realistic traffic simulation.
In particular, we leverage an implicit latent variable model to parameterize a joint actor policy.
We show TrafficSim generates significantly more realistic and diverse traffic scenarios as compared to a diverse set of baselines.
arXiv Detail & Related papers (2021-01-17T00:29:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.