OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation
- URL: http://arxiv.org/abs/2410.11792v1
- Date: Tue, 15 Oct 2024 17:17:54 GMT
- Title: OKAMI: Teaching Humanoid Robots Manipulation Skills through Single Video Imitation
- Authors: Jinhan Li, Yifeng Zhu, Yuqi Xie, Zhenyu Jiang, Mingyo Seo, Georgios Pavlakos, Yuke Zhu,
- Abstract summary: We introduce OKAMI, a method that generates a manipulation plan from a single RGB-D video.
OKAMI uses open-world vision models to identify task-relevant objects and retarget the body motions and hand poses separately.
- Score: 35.97702591413093
- License:
- Abstract: We study the problem of teaching humanoid robots manipulation skills by imitating from single video demonstrations. We introduce OKAMI, a method that generates a manipulation plan from a single RGB-D video and derives a policy for execution. At the heart of our approach is object-aware retargeting, which enables the humanoid robot to mimic the human motions in an RGB-D video while adjusting to different object locations during deployment. OKAMI uses open-world vision models to identify task-relevant objects and retarget the body motions and hand poses separately. Our experiments show that OKAMI achieves strong generalizations across varying visual and spatial conditions, outperforming the state-of-the-art baseline on open-world imitation from observation. Furthermore, OKAMI rollout trajectories are leveraged to train closed-loop visuomotor policies, which achieve an average success rate of 79.2% without the need for labor-intensive teleoperation. More videos can be found on our website https://ut-austin-rpl.github.io/OKAMI/.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.