Flow as the Cross-Domain Manipulation Interface
- URL: http://arxiv.org/abs/2407.15208v2
- Date: Fri, 4 Oct 2024 04:05:27 GMT
- Title: Flow as the Cross-Domain Manipulation Interface
- Authors: Mengda Xu, Zhenjia Xu, Yinghao Xu, Cheng Chi, Gordon Wetzstein, Manuela Veloso, Shuran Song,
- Abstract summary: Im2Flow2Act enables robots to acquire real-world manipulation skills without the need of real-world robot training data.
Im2Flow2Act comprises two components: a flow generation network and a flow-conditioned policy.
We demonstrate Im2Flow2Act's capabilities in a variety of real-world tasks, including the manipulation of rigid, articulated, and deformable objects.
- Score: 73.15952395641136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We present Im2Flow2Act, a scalable learning framework that enables robots to acquire real-world manipulation skills without the need of real-world robot training data. The key idea behind Im2Flow2Act is to use object flow as the manipulation interface, bridging domain gaps between different embodiments (i.e., human and robot) and training environments (i.e., real-world and simulated). Im2Flow2Act comprises two components: a flow generation network and a flow-conditioned policy. The flow generation network, trained on human demonstration videos, generates object flow from the initial scene image, conditioned on the task description. The flow-conditioned policy, trained on simulated robot play data, maps the generated object flow to robot actions to realize the desired object movements. By using flow as input, this policy can be directly deployed in the real world with a minimal sim-to-real gap. By leveraging real-world human videos and simulated robot play data, we bypass the challenges of teleoperating physical robots in the real world, resulting in a scalable system for diverse tasks. We demonstrate Im2Flow2Act's capabilities in a variety of real-world tasks, including the manipulation of rigid, articulated, and deformable objects.
Related papers
- Polaris: Open-ended Interactive Robotic Manipulation via Syn2Real Visual Grounding and Large Language Models [53.22792173053473]
We introduce an interactive robotic manipulation framework called Polaris.
Polaris integrates perception and interaction by utilizing GPT-4 alongside grounded vision models.
We propose a novel Synthetic-to-Real (Syn2Real) pose estimation pipeline.
arXiv Detail & Related papers (2024-08-15T06:40:38Z) - Manipulate-Anything: Automating Real-World Robots using Vision-Language Models [47.16659229389889]
We propose Manipulate-Anything, a scalable automated generation method for real-world robotic manipulation.
Manipulate-Anything can operate in real-world environments without any privileged state information, hand-designed skills, and can manipulate any static object.
arXiv Detail & Related papers (2024-06-27T06:12:01Z) - Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation [65.46610405509338]
We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation.
Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal.
We show that this approach of combining scalably learned track prediction with a residual policy enables diverse generalizable robot manipulation.
arXiv Detail & Related papers (2024-05-02T17:56:55Z) - Transferring Foundation Models for Generalizable Robotic Manipulation [82.12754319808197]
We propose a novel paradigm that effectively leverages language-reasoning segmentation mask generated by internet-scale foundation models.
Our approach can effectively and robustly perceive object pose and enable sample-efficient generalization learning.
Demos can be found in our submitted video, and more comprehensive ones can be found in link1 or link2.
arXiv Detail & Related papers (2023-06-09T07:22:12Z) - Accelerating Interactive Human-like Manipulation Learning with GPU-based
Simulation and High-quality Demonstrations [25.393382192511716]
We present an immersive virtual reality teleoperation interface designed for interactive human-like manipulation on contact rich tasks.
We demonstrate the complementary strengths of massively parallel RL and imitation learning, yielding robust and natural behaviors.
arXiv Detail & Related papers (2022-12-05T09:37:27Z) - DexTransfer: Real World Multi-fingered Dexterous Grasping with Minimal
Human Demonstrations [51.87067543670535]
We propose a robot-learning system that can take a small number of human demonstrations and learn to grasp unseen object poses.
We train a dexterous grasping policy that takes the point clouds of the object as input and predicts continuous actions to grasp objects from different initial robot states.
The policy learned from our dataset can generalize well on unseen object poses in both simulation and the real world.
arXiv Detail & Related papers (2022-09-28T17:51:49Z) - Real2Sim2Real Transfer for Control of Cable-driven Robots via a
Differentiable Physics Engine [9.268539775233346]
Tensegrity robots exhibit high strength-to-weight ratios and significant deformations.
They are hard to control, however, due to high dimensionality, complex dynamics, and a coupled architecture.
This paper describes a Real2Sim2Real (R2S2R) strategy for modeling tensegrity robots.
arXiv Detail & Related papers (2022-09-13T18:51:26Z) - Synthesis and Execution of Communicative Robotic Movements with
Generative Adversarial Networks [59.098560311521034]
We focus on how to transfer on two different robotic platforms the same kinematics modulation that humans adopt when manipulating delicate objects.
We choose to modulate the velocity profile adopted by the robots' end-effector, inspired by what humans do when transporting objects with different characteristics.
We exploit a novel Generative Adversarial Network architecture, trained with human kinematics examples, to generalize over them and generate new and meaningful velocity profiles.
arXiv Detail & Related papers (2022-03-29T15:03:05Z) - Learning Bipedal Robot Locomotion from Human Movement [0.791553652441325]
We present a reinforcement learning based method for teaching a real world bipedal robot to perform movements directly from motion capture data.
Our method seamlessly transitions from training in a simulation environment to executing on a physical robot.
We demonstrate our method on an internally developed humanoid robot with movements ranging from a dynamic walk cycle to complex balancing and waving.
arXiv Detail & Related papers (2021-05-26T00:49:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.