Related papers: Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination

Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination

URL: http://arxiv.org/abs/2412.14957v2
Date: Mon, 10 Mar 2025 09:40:42 GMT
Title: Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
Authors: Leonardo Barcellona, Andrii Zadaianchuk, Davide Allegro, Samuele Papa, Stefano Ghidoni, Efstratios Gavves,
Abstract summary: DreMa is a new approach for constructing digital twins using learned explicit representations of the real world and its dynamics.<n>We show that DreMa can successfully learn novel physical tasks from just a single example per task variation.
Score: 25.62602420895531
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A world model provides an agent with a representation of its environment, enabling it to predict the causal consequences of its actions. Current world models typically cannot directly and explicitly imitate the actual environment in front of a robot, often resulting in unrealistic behaviors and hallucinations that make them unsuitable for real-world robotics applications. To overcome those challenges, we propose to rethink robot world models as learnable digital twins. We introduce DreMa, a new approach for constructing digital twins automatically using learned explicit representations of the real world and its dynamics, bridging the gap between traditional digital twins and world models. DreMa replicates the observed world and its structure by integrating Gaussian Splatting and physics simulators, allowing robots to imagine novel configurations of objects and to predict the future consequences of robot actions thanks to its compositionality. We leverage this capability to generate new data for imitation learning by applying equivariant transformations to a small set of demonstrations. Our evaluations across various settings demonstrate significant improvements in accuracy and robustness by incrementing actions and object distributions, reducing the data needed to learn a policy and improving the generalization of the agents. As a highlight, we show that a real Franka Emika Panda robot, powered by DreMa's imagination, can successfully learn novel physical tasks from just a single example per task variation (one-shot policy learning). Our project page can be found in: https://dreamtomanipulate.github.io/.

Related papers

Dynamic Handover: Throw and Catch with Bimanual Hands [30.206469112964033]
We design a system with two multi-finger hands attached to robot arms to solve this problem. We train our system using Multi-Agent Reinforcement Learning in simulation and perform Sim2Real transfer to deploy on the real robots. To overcome the Sim2Real gap, we provide multiple novel algorithm designs including learning a trajectory prediction model for the object.
arXiv Detail & Related papers (2023-09-11T17:49:25Z)
Robot Learning with Sensorimotor Pre-training [98.7755895548928]
We present a self-supervised sensorimotor pre-training approach for robotics. Our model, called RPT, is a Transformer that operates on sequences of sensorimotor tokens. We find that sensorimotor pre-training consistently outperforms training from scratch, has favorable scaling properties, and enables transfer across different tasks, environments, and robots.
arXiv Detail & Related papers (2023-06-16T17:58:10Z)
Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models. Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning. Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z)
Real-World Humanoid Locomotion with Reinforcement Learning [92.85934954371099]
We present a fully learning-based approach for real-world humanoid locomotion. Our controller can walk over various outdoor terrains, is robust to external disturbances, and can adapt in context.
arXiv Detail & Related papers (2023-03-06T18:59:09Z)
RT-1: Robotics Transformer for Real-World Control at Scale [98.09428483862165]
We present a model class, dubbed Robotics Transformer, that exhibits promising scalable model properties. We verify our conclusions in a study of different model classes and their ability to generalize as a function of the data size, model size, and data diversity based on a large-scale data collection on real robots performing real-world tasks.
arXiv Detail & Related papers (2022-12-13T18:55:15Z)
DexTransfer: Real World Multi-fingered Dexterous Grasping with Minimal Human Demonstrations [51.87067543670535]
We propose a robot-learning system that can take a small number of human demonstrations and learn to grasp unseen object poses. We train a dexterous grasping policy that takes the point clouds of the object as input and predicts continuous actions to grasp objects from different initial robot states. The policy learned from our dataset can generalize well on unseen object poses in both simulation and the real world.
arXiv Detail & Related papers (2022-09-28T17:51:49Z)
DayDreamer: World Models for Physical Robot Learning [142.11031132529524]
Deep reinforcement learning is a common approach to robot learning but requires a large amount of trial and error to learn. Many advances in robot learning rely on simulators. In this paper, we apply Dreamer to 4 robots to learn online and directly in the real world, without simulators.
arXiv Detail & Related papers (2022-06-28T17:44:48Z)
RoboCraft: Learning to See, Simulate, and Shape Elasto-Plastic Objects with Graph Networks [32.00371492516123]
We present a model-based planning framework for modeling and manipulating elasto-plastic objects. Our system, RoboCraft, learns a particle-based dynamics model using graph neural networks (GNNs) to capture the structure of the underlying system. We show through experiments that with just 10 minutes of real-world robotic interaction data, our robot can learn a dynamics model that can be used to synthesize control signals to deform elasto-plastic objects into various target shapes.
arXiv Detail & Related papers (2022-05-05T20:28:15Z)
Factored World Models for Zero-Shot Generalization in Robotic Manipulation [7.258229016768018]
We learn to generalize over robotic pick-and-place tasks using object-factored world models. We use a residual stack of graph neural networks that receive action information at multiple levels in both their node and edge neural networks. We show that an ensemble of our models can be used to plan for tasks involving up to 12 pick and place actions using search.
arXiv Detail & Related papers (2022-02-10T21:26:11Z)
Full-Body Visual Self-Modeling of Robot Morphologies [29.76701883250049]
Internal computational models of physical bodies are fundamental to the ability of robots and animals alike to plan and control their actions. Recent progress in fully data-driven self-modeling has enabled machines to learn their own forward kinematics directly from task-agnostic interaction data. Here, we propose that instead of directly modeling forward-kinematics, a more useful form of self-modeling is one that could answer space occupancy queries.
arXiv Detail & Related papers (2021-11-11T18:58:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.