Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions
- URL: http://arxiv.org/abs/2410.12773v1
- Date: Wed, 16 Oct 2024 17:48:50 GMT
- Title: Harmon: Whole-Body Motion Generation of Humanoid Robots from Language Descriptions
- Authors: Zhenyu Jiang, Yuqi Xie, Jinhan Li, Ye Yuan, Yifeng Zhu, Yuke Zhu,
- Abstract summary: This work focuses on generating diverse whole-body motions for humanoid robots from language descriptions.
We leverage human motion priors from extensive human motion datasets to initialize humanoid motions.
Our approach demonstrates the capability to produce natural, expressive, and text-aligned humanoid motions.
- Score: 31.134450087838673
- License:
- Abstract: Humanoid robots, with their human-like embodiment, have the potential to integrate seamlessly into human environments. Critical to their coexistence and cooperation with humans is the ability to understand natural language communications and exhibit human-like behaviors. This work focuses on generating diverse whole-body motions for humanoid robots from language descriptions. We leverage human motion priors from extensive human motion datasets to initialize humanoid motions and employ the commonsense reasoning capabilities of Vision Language Models (VLMs) to edit and refine these motions. Our approach demonstrates the capability to produce natural, expressive, and text-aligned humanoid motions, validated through both simulated and real-world experiments. More videos can be found at https://ut-austin-rpl.github.io/Harmon/.
Related papers
- RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations [38.1742893736782]
RHINO is a general humanoid-human-object interaction framework.
It provides a unified view of reactive motion, instruction-based manipulation, and safety concerns.
It decouples the interaction process into two levels: 1) a high-level planner inferring human intentions from real-time human behaviors; and 2) a low-level controller achieving reactive motion behaviors and object manipulation skills based on predicted intentions.
arXiv Detail & Related papers (2025-02-18T18:56:41Z) - Learning from Massive Human Videos for Universal Humanoid Pose Control [46.417054298537195]
This paper introduces Humanoid-X, a large-scale dataset of over 20 million humanoid robot poses with corresponding text-based motion descriptions.
We train a large humanoid model, UH-1, which takes text instructions as input and outputs corresponding actions to control a humanoid robot.
Our scalable training approach leads to superior generalization in text-based humanoid control, marking a significant step toward adaptable, real-world-ready humanoid robots.
arXiv Detail & Related papers (2024-12-18T18:59:56Z) - HumanPlus: Humanoid Shadowing and Imitation from Humans [82.47551890765202]
We introduce a full-stack system for humanoids to learn motion and autonomous skills from human data.
We first train a low-level policy in simulation via reinforcement learning using existing 40-hour human motion datasets.
We then perform supervised behavior cloning to train skill policies using egocentric vision, allowing humanoids to complete different tasks autonomously.
arXiv Detail & Related papers (2024-06-15T00:41:34Z) - Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation [34.65637397405485]
We present Human to Humanoid (H2O), a framework that enables real-time whole-body teleoperation of a humanoid robot with only an RGB camera.
We train a robust real-time humanoid motion imitator in simulation using these refined motions and transfer it to the real humanoid robot in a zero-shot manner.
To the best of our knowledge, this is the first demonstration to achieve learning-based real-time whole-body humanoid teleoperation.
arXiv Detail & Related papers (2024-03-07T12:10:41Z) - Expressive Whole-Body Control for Humanoid Robots [20.132927075816742]
We learn a whole-body control policy on a human-sized robot to mimic human motions as realistic as possible.
With training in simulation and Sim2Real transfer, our policy can control a humanoid robot to walk in different styles, shake hands with humans, and even dance with a human in the real world.
arXiv Detail & Related papers (2024-02-26T18:09:24Z) - Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots [119.55240471433302]
Habitat 3.0 is a simulation platform for studying collaborative human-robot tasks in home environments.
It addresses challenges in modeling complex deformable bodies and diversity in appearance and motion.
Human-in-the-loop infrastructure enables real human interaction with simulated robots via mouse/keyboard or a VR interface.
arXiv Detail & Related papers (2023-10-19T17:29:17Z) - HuBo-VLM: Unified Vision-Language Model designed for HUman roBOt
interaction tasks [5.057755436092344]
Human robot interaction is an exciting task, which aimed to guide robots following instructions from human.
HuBo-VLM is proposed to tackle perception tasks associated with human robot interaction.
arXiv Detail & Related papers (2023-08-24T03:47:27Z) - HERD: Continuous Human-to-Robot Evolution for Learning from Human
Demonstration [57.045140028275036]
We show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning.
We propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy.
arXiv Detail & Related papers (2022-12-08T15:56:13Z) - Robots with Different Embodiments Can Express and Influence Carefulness
in Object Manipulation [104.5440430194206]
This work investigates the perception of object manipulations performed with a communicative intent by two robots.
We designed the robots' movements to communicate carefulness or not during the transportation of objects.
arXiv Detail & Related papers (2022-08-03T13:26:52Z) - Synthesis and Execution of Communicative Robotic Movements with
Generative Adversarial Networks [59.098560311521034]
We focus on how to transfer on two different robotic platforms the same kinematics modulation that humans adopt when manipulating delicate objects.
We choose to modulate the velocity profile adopted by the robots' end-effector, inspired by what humans do when transporting objects with different characteristics.
We exploit a novel Generative Adversarial Network architecture, trained with human kinematics examples, to generalize over them and generate new and meaningful velocity profiles.
arXiv Detail & Related papers (2022-03-29T15:03:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.