Related papers: From Human Hands to Robot Arms: Manipulation Skills Transfer via Trajectory Alignment

From Human Hands to Robot Arms: Manipulation Skills Transfer via Trajectory Alignment

URL: http://arxiv.org/abs/2510.00491v1
Date: Wed, 01 Oct 2025 04:21:12 GMT
Title: From Human Hands to Robot Arms: Manipulation Skills Transfer via Trajectory Alignment
Authors: Han Zhou, Jinjin Cao, Liyuan Ma, Xueji Fang, Guo-jun Qi,
Abstract summary: Learning diverse manipulation skills for real-world robots is bottlenecked by reliance on costly and hard-to-scale teleoperated demonstrations.<n>We introduce Traj2Action, a novel framework that bridges this embodiment gap by using the 3D trajectory of the operational endpoint as a unified intermediate representation.<n>Our policy first learns to generate a coarse trajectory, which forms a high-level motion plan by leveraging both human and robot data.
Score: 36.08997778717271
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Learning diverse manipulation skills for real-world robots is severely bottlenecked by the reliance on costly and hard-to-scale teleoperated demonstrations. While human videos offer a scalable alternative, effectively transferring manipulation knowledge is fundamentally hindered by the significant morphological gap between human and robotic embodiments. To address this challenge and facilitate skill transfer from human to robot, we introduce Traj2Action,a novel framework that bridges this embodiment gap by using the 3D trajectory of the operational endpoint as a unified intermediate representation, and then transfers the manipulation knowledge embedded in this trajectory to the robot's actions. Our policy first learns to generate a coarse trajectory, which forms an high-level motion plan by leveraging both human and robot data. This plan then conditions the synthesis of precise, robot-specific actions (e.g., orientation and gripper state) within a co-denoising framework. Extensive real-world experiments on a Franka robot demonstrate that Traj2Action boosts the performance by up to 27% and 22.25% over $\pi_0$ baseline on short- and long-horizon real-world tasks, and achieves significant gains as human data scales in robot policy learning. Our project website, featuring code and video demonstrations, is available at https://anonymous.4open.science/w/Traj2Action-4A45/.

Related papers

Dexterity from Smart Lenses: Multi-Fingered Robot Manipulation with In-the-Wild Human Demonstrations [52.29884993824894]
Learning multi-fingered robot policies from humans performing daily tasks in natural environments has long been a grand goal in the robotics community.<n>AINA enables learning multi-fingered policies from data collected by anyone, anywhere, and in any environment using Aria Gen 2 glasses.
arXiv Detail & Related papers (2025-11-20T18:59:02Z)
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation [53.63540587160549]
VidBot is a framework enabling zero-shot robotic manipulation using learned 3D affordance from in-the-wild monocular RGB-only human videos.<n> VidBot paves the way for leveraging everyday human videos to make robot learning more scalable.
arXiv Detail & Related papers (2025-03-10T10:04:58Z)
AnyDexGrasp: General Dexterous Grasping for Different Hands with Human-level Learning Efficiency [49.868970174484204]
We introduce an efficient approach for learning dexterous grasping with minimal data.<n>Our method achieves high performance with human-level learning efficiency: only hundreds of grasp attempts on 40 training objects.<n>This method demonstrates promising applications for humanoid robots, prosthetics, and other domains requiring robust, versatile robotic manipulation.
arXiv Detail & Related papers (2025-02-23T03:26:06Z)
Learning to Transfer Human Hand Skills for Robot Manipulations [12.797862020095856]
We present a method for teaching dexterous manipulation tasks to robots from human hand motion demonstrations.<n>Our approach learns a joint motion manifold that maps human hand movements, robot hand actions, and object movements in 3D, enabling us to infer one motion from others.
arXiv Detail & Related papers (2025-01-07T22:33:47Z)
Human-Humanoid Robots Cross-Embodiment Behavior-Skill Transfer Using Decomposed Adversarial Learning from Demonstration [9.42179962375058]
We propose a transferable framework that reduces the data bottleneck by using a unified digital human model as a common prototype.<n>The model learns behavior primitives from human demonstrations through adversarial imitation, and complex robot structures are decomposed into functional components.<n>Our framework is validated on five humanoid robots with diverse configurations.
arXiv Detail & Related papers (2024-12-19T18:41:45Z)
One-Shot Imitation under Mismatched Execution [7.060120660671016]
Human demonstrations are a powerful way to program robots to do long-horizon manipulation tasks.<n> translating these demonstrations into robot-executable actions presents significant challenges due to execution mismatches in movement styles and physical capabilities.<n>We propose RHyME, a novel framework that automatically pairs human and robot trajectories using sequence-level optimal transport cost functions.
arXiv Detail & Related papers (2024-09-10T16:11:57Z)
Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations [66.47064743686953]
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation. In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies.
arXiv Detail & Related papers (2023-07-12T07:04:53Z)
Surfer: Progressive Reasoning with World Models for Robotic Manipulation [51.26109827779267]
We introduce a novel and simple robot manipulation framework, called Surfer. Surfer treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene. It is based on the world model, treats robot manipulation as a state transfer of the visual scene, and decouples it into two parts: action and scene.
arXiv Detail & Related papers (2023-06-20T07:06:04Z)
HERD: Continuous Human-to-Robot Evolution for Learning from Human Demonstration [57.045140028275036]
We show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning. We propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy.
arXiv Detail & Related papers (2022-12-08T15:56:13Z)
Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z)
Learning Bipedal Robot Locomotion from Human Movement [0.791553652441325]
We present a reinforcement learning based method for teaching a real world bipedal robot to perform movements directly from motion capture data. Our method seamlessly transitions from training in a simulation environment to executing on a physical robot. We demonstrate our method on an internally developed humanoid robot with movements ranging from a dynamic walk cycle to complex balancing and waving.
arXiv Detail & Related papers (2021-05-26T00:49:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.