Related papers: When a Robot is More Capable than a Human: Learning from Constrained Demonstrators

When a Robot is More Capable than a Human: Learning from Constrained Demonstrators

URL: http://arxiv.org/abs/2510.09096v1
Date: Fri, 10 Oct 2025 07:48:12 GMT
Title: When a Robot is More Capable than a Human: Learning from Constrained Demonstrators
Authors: Xinhu Li, Ayush Jain, Zhaojing Yang, Yigit Korkmaz, Erdem Bıyık,
Abstract summary: Learning from demonstrations enables experts to teach robots complex tasks using interfaces such as kinesthetic teaching, joystick control, and sim-to-real transfer.<n>These interfaces often constrain the expert's ability to demonstrate optimal behavior due to indirect control, setup restrictions, and hardware safety.<n>This raises a key question: Can a robot learn a better policy than the one demonstrated by a constrained expert?<n>We address this by allowing the agent to go beyond direct imitation of expert actions and explore shorter and more efficient trajectories.
Score: 4.015444385806047
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning from demonstrations enables experts to teach robots complex tasks using interfaces such as kinesthetic teaching, joystick control, and sim-to-real transfer. However, these interfaces often constrain the expert's ability to demonstrate optimal behavior due to indirect control, setup restrictions, and hardware safety. For example, a joystick can move a robotic arm only in a 2D plane, even though the robot operates in a higher-dimensional space. As a result, the demonstrations collected by constrained experts lead to suboptimal performance of the learned policies. This raises a key question: Can a robot learn a better policy than the one demonstrated by a constrained expert? We address this by allowing the agent to go beyond direct imitation of expert actions and explore shorter and more efficient trajectories. We use the demonstrations to infer a state-only reward signal that measures task progress, and self-label reward for unknown states using temporal interpolation. Our approach outperforms common imitation learning in both sample efficiency and task completion time. On a real WidowX robotic arm, it completes the task in 12 seconds, 10x faster than behavioral cloning, as shown in real-robot videos on https://sites.google.com/view/constrainedexpert .

Related papers

From Human Hands to Robot Arms: Manipulation Skills Transfer via Trajectory Alignment [36.08997778717271]
Learning diverse manipulation skills for real-world robots is bottlenecked by reliance on costly and hard-to-scale teleoperated demonstrations.<n>We introduce Traj2Action, a novel framework that bridges this embodiment gap by using the 3D trajectory of the operational endpoint as a unified intermediate representation.<n>Our policy first learns to generate a coarse trajectory, which forms a high-level motion plan by leveraging both human and robot data.
arXiv Detail & Related papers (2025-10-01T04:21:12Z)
Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration [21.94699075066712]
We propose a novel real-to-sim-to-real framework for training dexterous manipulation policies using only one RGB-D video of a human demonstrating a task.<n>Human2Sim2Robot outperforms object-aware replay by over 55% and imitation learning by over 68% on grasping, non-prehensile manipulation, and multi-step tasks.
arXiv Detail & Related papers (2025-04-17T03:15:20Z)
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation [53.63540587160549]
VidBot is a framework enabling zero-shot robotic manipulation using learned 3D affordance from in-the-wild monocular RGB-only human videos.<n> VidBot paves the way for leveraging everyday human videos to make robot learning more scalable.
arXiv Detail & Related papers (2025-03-10T10:04:58Z)
SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation [58.14969377419633]
We propose spire, a system that decomposes tasks into smaller learning subproblems and second combines imitation and reinforcement learning to maximize their strengths. We find that spire outperforms prior approaches that integrate imitation learning, reinforcement learning, and planning by 35% to 50% in average task performance.
arXiv Detail & Related papers (2024-10-23T17:42:07Z)
Giving Robots a Hand: Learning Generalizable Manipulation with Eye-in-Hand Human Video Demonstrations [66.47064743686953]
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and generalization in vision-based robotic manipulation. Videos of humans performing tasks, on the other hand, are much cheaper to collect since they eliminate the need for expertise in robotic teleoperation. In this work, we augment narrow robotic imitation datasets with broad unlabeled human video demonstrations to greatly enhance the generalization of eye-in-hand visuomotor policies.
arXiv Detail & Related papers (2023-07-12T07:04:53Z)
AR2-D2:Training a Robot Without a Robot [53.10633639596096]
We introduce AR2-D2, a system for collecting demonstrations which does not require people with specialized training. AR2-D2 is a framework in the form of an iOS app that people can use to record a video of themselves manipulating any object. We show that data collected via our system enables the training of behavior cloning agents in manipulating real objects.
arXiv Detail & Related papers (2023-06-23T23:54:26Z)
Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on. In this work, we propose MEDAL++, a novel design for self-improving robotic systems. The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z)
Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z)
Learning Preferences for Interactive Autonomy [1.90365714903665]
This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities. We first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function.
arXiv Detail & Related papers (2022-10-19T21:34:51Z)
Training Robots without Robots: Deep Imitation Learning for Master-to-Robot Policy Transfer [4.318590074766604]
Deep imitation learning is promising for robot manipulation because it only requires demonstration samples. Existing demonstration methods have deficiencies; bilateral teleoperation requires a complex control scheme and is expensive. This research proposes a new master-to-robot (M2R) policy transfer system that does not require robots for teaching force feedback-based manipulation tasks.
arXiv Detail & Related papers (2022-02-19T10:55:10Z)
Transformers for One-Shot Visual Imitation [28.69615089950047]
Humans are able to seamlessly visually imitate others, by inferring their intentions and using past experience to achieve the same end goal. Prior research in robot imitation learning has created agents which can acquire diverse skills from expert human operators. This paper investigates techniques which allow robots to partially bridge these domain gaps, using their past experience.
arXiv Detail & Related papers (2020-11-11T18:41:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.