Related papers: Generalizable Humanoid Manipulation with 3D Diffusion Policies

Generalizable Humanoid Manipulation with 3D Diffusion Policies

URL: http://arxiv.org/abs/2410.10803v2
Date: Wed, 19 Feb 2025 02:13:13 GMT
Title: Generalizable Humanoid Manipulation with 3D Diffusion Policies
Authors: Yanjie Ze, Zixuan Chen, Wenhao Wang, Tianyi Chen, Xialin He, Ying Yuan, Xue Bin Peng, Jiajun Wu,
Abstract summary: We build a real-world robotic system to address the problem of autonomous manipulation by humanoid robots.<n>Our system is mainly an integration of 1) a whole-upper-body robotic teleoperation system to acquire human-like robot data, and 2) a 25-DoF humanoid robot platform with a height-adjustable cart and a 3D LiDAR sensor.<n>We show that using only data collected in one scene and with only onboard computing, a full-sized humanoid robot can autonomously perform skills in diverse real-world scenarios.
Score: 41.23383596258797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Humanoid robots capable of autonomous operation in diverse environments have long been a goal for roboticists. However, autonomous manipulation by humanoid robots has largely been restricted to one specific scene, primarily due to the difficulty of acquiring generalizable skills and the expensiveness of in-the-wild humanoid robot data. In this work, we build a real-world robotic system to address this challenging problem. Our system is mainly an integration of 1) a whole-upper-body robotic teleoperation system to acquire human-like robot data, 2) a 25-DoF humanoid robot platform with a height-adjustable cart and a 3D LiDAR sensor, and 3) an improved 3D Diffusion Policy learning algorithm for humanoid robots to learn from noisy human data. We run more than 2000 episodes of policy rollouts on the real robot for rigorous policy evaluation. Empowered by this system, we show that using only data collected in one single scene and with only onboard computing, a full-sized humanoid robot can autonomously perform skills in diverse real-world scenarios. Videos are available at \href{https://humanoid-manipulation.github.io}{humanoid-manipulation.github.io}.

Related papers

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots [133.23509142762356]
General-purpose robots need a versatile body and an intelligent mind. Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy. We introduce GR00T N1, an open foundation model for humanoid robots.
arXiv Detail & Related papers (2025-03-18T21:06:21Z)
Humanoid Policy ~ Human Policy [26.01581047414598]
We train a human-humanoid behavior policy, which we term Human Action Transformer (HAT) The state-action space of HAT is unified for both humans and humanoid robots and can be differentiably retargeted to robot actions. We show that human data improves both generalization and robustness of HAT with significantly better data collection efficiency.
arXiv Detail & Related papers (2025-03-17T17:59:09Z)
Learning from Massive Human Videos for Universal Humanoid Pose Control [46.417054298537195]
This paper introduces Humanoid-X, a large-scale dataset of over 20 million humanoid robot poses with corresponding text-based motion descriptions. We train a large humanoid model, UH-1, which takes text instructions as input and outputs corresponding actions to control a humanoid robot. Our scalable training approach leads to superior generalization in text-based humanoid control, marking a significant step toward adaptable, real-world-ready humanoid robots.
arXiv Detail & Related papers (2024-12-18T18:59:56Z)
HumanPlus: Humanoid Shadowing and Imitation from Humans [82.47551890765202]
We introduce a full-stack system for humanoids to learn motion and autonomous skills from human data. We first train a low-level policy in simulation via reinforcement learning using existing 40-hour human motion datasets. We then perform supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously.
arXiv Detail & Related papers (2024-06-15T00:41:34Z)
Vision-based Manipulation from Single Human Video with Open-World Object Graphs [58.23098483464538]
We present an object-centric approach to empower robots to learn vision-based manipulation skills from human videos. We introduce ORION, an algorithm that tackles the problem by extracting an object-centric manipulation plan from a single RGB-D video.
arXiv Detail & Related papers (2024-05-30T17:56:54Z)
Track2Act: Predicting Point Tracks from Internet Videos enables Generalizable Robot Manipulation [65.46610405509338]
We seek to learn a generalizable goal-conditioned policy that enables zero-shot robot manipulation. Our framework,Track2Act predicts tracks of how points in an image should move in future time-steps based on a goal. We show that this approach of combining scalably learned track prediction with a residual policy enables diverse generalizable robot manipulation.
arXiv Detail & Related papers (2024-05-02T17:56:55Z)
HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation [50.616995671367704]
We present a high-dimensional, simulated robot learning benchmark, HumanoidBench, featuring a humanoid robot equipped with dexterous hands. Our findings reveal that state-of-the-art reinforcement learning algorithms struggle with most tasks, whereas a hierarchical learning approach achieves superior performance when supported by robust low-level policies.
arXiv Detail & Related papers (2024-03-15T17:45:44Z)
3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations [19.41216557646392]
3D Diffusion Policy (DP3) is a novel visual imitation learning approach. In experiments, DP3 handles most tasks with just 10 demonstrations and surpasses baselines with a 24.2% relative improvement. In real robot experiments, DP3 rarely violates safety requirements, in contrast to baseline methods which frequently do.
arXiv Detail & Related papers (2024-03-06T18:58:49Z)
3D Diffuser Actor: Policy Diffusion with 3D Scene Representations [19.914227905704102]
3D robot policies use 3D scene feature representations aggregated from a single or multiple camera views. We present 3D diffuser actor, a neural policy equipped with a novel 3D denoising transformer. It sets a new state-of-the-art on RLBench with an absolute performance gain of 18.1% over the current SOTA. It also learns to control a robot manipulator in the real world from a handful of demonstrations.
arXiv Detail & Related papers (2024-02-16T18:43:02Z)
Robo360: A 3D Omnispective Multi-Material Robotic Manipulation Dataset [26.845899347446807]
Recent interest in leveraging 3D algorithms has led to advancements in robot perception and physical understanding. We present Robo360, a dataset that features robotic manipulation with a dense view coverage. We hope that Robo360 can open new research directions yet to be explored at the intersection of understanding the physical world in 3D and robot control.
arXiv Detail & Related papers (2023-12-09T09:12:03Z)
Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations [66.47064743686953]
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation. In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies.
arXiv Detail & Related papers (2023-07-12T07:04:53Z)
DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects [8.195608430584073]
We propose a new benchmark called DexArt, which involves Dexterous manipulation with Articulated objects in a physical simulator. Our main focus is to evaluate the generalizability of the learned policy on unseen articulated objects. We use Reinforcement Learning with 3D representation learning to achieve generalization.
arXiv Detail & Related papers (2023-05-09T18:30:58Z)
HERD: Continuous Human-to-Robot Evolution for Learning from Human Demonstration [57.045140028275036]
We show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning. We propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy.
arXiv Detail & Related papers (2022-12-08T15:56:13Z)
Robotic Telekinesis: Learning a Robotic Hand Imitator by Watching Humans on Youtube [24.530131506065164]
We build a system that enables any human to control a robot hand and arm, simply by demonstrating motions with their own hand. The robot observes the human operator via a single RGB camera and imitates their actions in real-time. We leverage this data to train a system that understands human hands and retargets a human video stream into a robot hand-arm trajectory that is smooth, swift, safe, and semantically similar to the guiding demonstration.
arXiv Detail & Related papers (2022-02-21T18:59:59Z)
Know Thyself: Transferable Visuomotor Control Through Robot-Awareness [22.405839096833937]
Training visuomotor robot controllers from scratch on a new robot typically requires generating large amounts of robot-specific data. We propose a "robot-aware" solution paradigm that exploits readily available robot "self-knowledge" Our experiments on tabletop manipulation tasks in simulation and on real robots demonstrate that these plug-in improvements dramatically boost the transferability of visuomotor controllers.
arXiv Detail & Related papers (2021-07-19T17:56:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.