Related papers: EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data

EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data

URL: http://arxiv.org/abs/2509.19626v1
Date: Tue, 23 Sep 2025 22:34:47 GMT
Title: EgoBridge: Domain Adaptation for Generalizable Imitation from Egocentric Human Data
Authors: Ryan Punamiya, Dhruv Patel, Patcharapong Aphiwetsa, Pranav Kuppili, Lawrence Y. Zhu, Simar Kareer, Judy Hoffman, Danfei Xu,
Abstract summary: EgoBridge aims to align the policy latent spaces between human and robot data using domain adaptation.<n>It achieves a significant absolute policy success rate improvement by 44% over human-augmented cross-embodiment baselines.
Score: 18.635934496561365
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Egocentric human experience data presents a vast resource for scaling up end-to-end imitation learning for robotic manipulation. However, significant domain gaps in visual appearance, sensor modalities, and kinematics between human and robot impede knowledge transfer. This paper presents EgoBridge, a unified co-training framework that explicitly aligns the policy latent spaces between human and robot data using domain adaptation. Through a measure of discrepancy on the joint policy latent features and actions based on Optimal Transport (OT), we learn observation representations that not only align between the human and robot domain but also preserve the action-relevant information critical for policy learning. EgoBridge achieves a significant absolute policy success rate improvement by 44% over human-augmented cross-embodiment baselines in three real-world single-arm and bimanual manipulation tasks. EgoBridge also generalizes to new objects, scenes, and tasks seen only in human data, where baselines fail entirely. Videos and additional information can be found at https://ego-bridge.github.io

Related papers

EgoActor: Grounding Task Planning into Spatial-aware Egocentric Actions for Humanoid Robots via Visual-Language Models [31.768426199719816]
We propose EgoActing, which requires directly grounding high-level instructions into various, precise, spatially aware humanoid actions.<n>We further instantiate this task by introducing EgoActor, a unified and scalable vision-language model (VLM) that can predict locomotion primitives.<n>We leverage broad supervision over egocentric RGB-only data from real-world demonstrations, spatial reasoning question-answering, and simulated environment demonstrations.
arXiv Detail & Related papers (2026-02-04T13:04:56Z)
Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation [16.701354625940308]
Humanoid Everyday is a large-scale and diverse humanoid manipulation dataset.<n>It aggregates high-quality multimodal sensory data, including RGB, depth, LiDAR, and tactile inputs, together with natural language annotations.<n>We conduct an analysis of representative policy learning methods on our dataset, providing insights into their strengths and limitations.
arXiv Detail & Related papers (2025-10-09T20:43:27Z)
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction [76.44108003274955]
A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning policies.<n>We introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh.<n>By minimizing the Laplacian deformation between the human and robot meshes, OmniRetarget generates kinematically feasible trajectories.
arXiv Detail & Related papers (2025-09-30T17:59:02Z)
Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions [110.43343503158306]
This paper embeds the manual-assisted task into a vision-language-action framework, where the assistant provides services to the instructor following egocentric vision and commands.<n>Under this setting, we accomplish InterVLA, the first large-scale human-object-human interaction dataset with 11.4 hours and 1.2M frames of multimodal data.<n>We establish novel benchmarks on egocentric human motion estimation, interaction synthesis, and interaction prediction with comprehensive analysis.
arXiv Detail & Related papers (2025-08-06T17:46:23Z)
EgoZero: Robot Learning from Smart Glasses [54.6168258133554]
EgoZero learns robust manipulation policies from human demonstrations captured with Project Aria smart glasses.<n>We deploy EgoZero policies on a Franka Panda robot and demonstrate zero-shot transfer with 70% success rate over 7 manipulation tasks.<n>Our results suggest that in-the-wild human data can serve as a scalable foundation for real-world robot learning.
arXiv Detail & Related papers (2025-05-26T17:59:17Z)
Humanoid Policy ~ Human Policy [26.01581047414598]
We train a human-humanoid behavior policy, which we term Human Action Transformer (HAT)<n>The state-action space of HAT is unified for both humans and humanoid robots and can be differentiably retargeted to robot actions.<n>We show that human data improves both generalization and robustness of HAT with significantly better data collection efficiency.
arXiv Detail & Related papers (2025-03-17T17:59:09Z)
EgoMimic: Scaling Imitation Learning via Egocentric Video [22.902881956495765]
We present EgoMimic, a full-stack framework which scales manipulation via human embodiment data. EgoMimic achieves this through: (1) a system to capture human embodiment data using the ergonomic Project Aria glasses, (2) a low-cost bimanual manipulator that minimizes the kinematic gap to human data, and (4) an imitation learning architecture that co-trains on human and robot data.
arXiv Detail & Related papers (2024-10-31T17:59:55Z)
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation [16.809190349155525]
We propose a novel adaptation paradigm that leverages readily available paired human-robot video data to bridge the domain gap.<n>Our method employs a human-robot contrastive alignment loss to align the semantics of human and robot videos, adapting pre-trained models to the robot domain in a parameter-efficient manner.
arXiv Detail & Related papers (2024-06-20T11:57:46Z)
ImitationNet: Unsupervised Human-to-Robot Motion Retargeting via Shared Latent Space [9.806227900768926]
This paper introduces a novel deep-learning approach for human-to-robot motion. Our method does not require paired human-to-robot data, which facilitates its translation to new robots. Our model outperforms existing works regarding human-to-robot similarity in terms of efficiency and precision.
arXiv Detail & Related papers (2023-09-11T08:55:04Z)
SACSoN: Scalable Autonomous Control for Social Navigation [62.59274275261392]
We develop methods for training policies for socially unobtrusive navigation. By minimizing this counterfactual perturbation, we can induce robots to behave in ways that do not alter the natural behavior of humans in the shared space. We collect a large dataset where an indoor mobile robot interacts with human bystanders.
arXiv Detail & Related papers (2023-06-02T19:07:52Z)
Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. In this work, we propose MEDAL++, a novel design for self-improving robotic systems. The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z)
Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.