RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations
- URL: http://arxiv.org/abs/2502.13134v1
- Date: Tue, 18 Feb 2025 18:56:41 GMT
- Title: RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations
- Authors: Jingxiao Chen, Xinyao Li, Jiahang Cao, Zhengbang Zhu, Wentao Dong, Minghuan Liu, Ying Wen, Yong Yu, Liqing Zhang, Weinan Zhang,
- Abstract summary: RHINO is a general humanoid-human-object interaction framework.
It provides a unified view of reactive motion, instruction-based manipulation, and safety concerns.
It decouples the interaction process into two levels: 1) a high-level planner inferring human intentions from real-time human behaviors; and 2) a low-level controller achieving reactive motion behaviors and object manipulation skills based on predicted intentions.
- Score: 38.1742893736782
- License:
- Abstract: Humanoid robots have shown success in locomotion and manipulation. Despite these basic abilities, humanoids are still required to quickly understand human instructions and react based on human interaction signals to become valuable assistants in human daily life. Unfortunately, most existing works only focus on multi-stage interactions, treating each task separately, and neglecting real-time feedback. In this work, we aim to empower humanoid robots with real-time reaction abilities to achieve various tasks, allowing human to interrupt robots at any time, and making robots respond to humans immediately. To support such abilities, we propose a general humanoid-human-object interaction framework, named RHINO, i.e., Real-time Humanoid-human Interaction and Object manipulation. RHINO provides a unified view of reactive motion, instruction-based manipulation, and safety concerns, over multiple human signal modalities, such as languages, images, and motions. RHINO is a hierarchical learning framework, enabling humanoids to learn reaction skills from human-human-object demonstrations and teleoperation data. In particular, it decouples the interaction process into two levels: 1) a high-level planner inferring human intentions from real-time human behaviors; and 2) a low-level controller achieving reactive motion behaviors and object manipulation skills based on the predicted intentions. We evaluate the proposed framework on a real humanoid robot and demonstrate its effectiveness, flexibility, and safety in various scenarios.
Related papers
- Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from Demonstration [9.42179962375058]
We propose a transferable framework that reduces the data bottleneck by using a unified digital human model as a common prototype.
The model learns behavior primitives from human demonstrations through adversarial imitation, and complex robot structures are decomposed into functional components.
Our framework is validated on five humanoid robots with diverse configurations.
arXiv Detail & Related papers (2024-12-19T18:41:45Z) - Imitation of human motion achieves natural head movements for humanoid robots in an active-speaker detection task [2.8220015774219567]
Head movements are crucial for social human-human interaction.
In this work, we employed a generative AI pipeline to produce human-like head movements for a Nao humanoid robot.
The results show that the Nao robot successfully imitates human head movements in a natural manner while actively tracking the speakers during the conversation.
arXiv Detail & Related papers (2024-07-16T17:08:40Z) - HumanoidBench: Simulated Humanoid Benchmark for Whole-Body Locomotion and Manipulation [50.616995671367704]
We present a high-dimensional, simulated robot learning benchmark, HumanoidBench, featuring a humanoid robot equipped with dexterous hands.
Our findings reveal that state-of-the-art reinforcement learning algorithms struggle with most tasks, whereas a hierarchical learning approach achieves superior performance when supported by robust low-level policies.
arXiv Detail & Related papers (2024-03-15T17:45:44Z) - Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots [119.55240471433302]
Habitat 3.0 is a simulation platform for studying collaborative human-robot tasks in home environments.
It addresses challenges in modeling complex deformable bodies and diversity in appearance and motion.
Human-in-the-loop infrastructure enables real human interaction with simulated robots via mouse/keyboard or a VR interface.
arXiv Detail & Related papers (2023-10-19T17:29:17Z) - HERD: Continuous Human-to-Robot Evolution for Learning from Human
Demonstration [57.045140028275036]
We show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning.
We propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy.
arXiv Detail & Related papers (2022-12-08T15:56:13Z) - Robots with Different Embodiments Can Express and Influence Carefulness
in Object Manipulation [104.5440430194206]
This work investigates the perception of object manipulations performed with a communicative intent by two robots.
We designed the robots' movements to communicate carefulness or not during the transportation of objects.
arXiv Detail & Related papers (2022-08-03T13:26:52Z) - Human Grasp Classification for Reactive Human-to-Robot Handovers [50.91803283297065]
We propose an approach for human-to-robot handovers in which the robot meets the human halfway.
We collect a human grasp dataset which covers typical ways of holding objects with various hand shapes and poses.
We present a planning and execution approach that takes the object from the human hand according to the detected grasp and hand position.
arXiv Detail & Related papers (2020-03-12T19:58:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.