Learning Reward Functions for Robotic Manipulation by Observing Humans
- URL: http://arxiv.org/abs/2211.09019v1
- Date: Wed, 16 Nov 2022 16:26:48 GMT
- Title: Learning Reward Functions for Robotic Manipulation by Observing Humans
- Authors: Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce and
Cordelia Schmid
- Abstract summary: We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
- Score: 92.30657414416527
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Observing a human demonstrator manipulate objects provides a rich, scalable
and inexpensive source of data for learning robotic policies. However,
transferring skills from human videos to a robotic manipulator poses several
challenges, not least a difference in action and observation spaces. In this
work, we use unlabeled videos of humans solving a wide range of manipulation
tasks to learn a task-agnostic reward function for robotic manipulation
policies. Thanks to the diversity of this training data, the learned reward
function sufficiently generalizes to image observations from a previously
unseen robot embodiment and environment to provide a meaningful prior for
directed exploration in reinforcement learning. The learned rewards are based
on distances to a goal in an embedding space learned using a time-contrastive
objective. By conditioning the function on a goal image, we are able to reuse
one model across a variety of tasks. Unlike prior work on leveraging human
videos to teach robots, our method, Human Offline Learned Distances (HOLD)
requires neither a priori data from the robot environment, nor a set of
task-specific human demonstrations, nor a predefined notion of correspondence
across morphologies, yet it is able to accelerate training of several
manipulation tasks on a simulated robot arm compared to using only a sparse
reward obtained from task completion.
Related papers
- Towards Generalizable Zero-Shot Manipulation via Translating Human
Interaction Plans [58.27029676638521]
We show how passive human videos can serve as a rich source of data for learning such generalist robots.
We learn a human plan predictor that, given a current image of a scene and a goal image, predicts the future hand and object configurations.
We show that our learned system can perform over 16 manipulation skills that generalize to 40 objects.
arXiv Detail & Related papers (2023-12-01T18:54:12Z) - Learning Video-Conditioned Policies for Unseen Manipulation Tasks [83.2240629060453]
Video-conditioned Policy learning maps human demonstrations of previously unseen tasks to robot manipulation skills.
We learn our policy to generate appropriate actions given current scene observations and a video of the target task.
We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art.
arXiv Detail & Related papers (2023-05-10T16:25:42Z) - Affordances from Human Videos as a Versatile Representation for Robotics [31.248842798600606]
We train a visual affordance model that estimates where and how in the scene a human is likely to interact.
The structure of these behavioral affordances directly enables the robot to perform many complex tasks.
We show the efficacy of our approach, which we call VRB, across 4 real world environments, over 10 different tasks, and 2 robotic platforms operating in the wild.
arXiv Detail & Related papers (2023-04-17T17:59:34Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Zero-Shot Robot Manipulation from Passive Human Videos [59.193076151832145]
We develop a framework for extracting agent-agnostic action representations from human videos.
Our framework is based on predicting plausible human hand trajectories.
We deploy the trained model zero-shot for physical robot manipulation tasks.
arXiv Detail & Related papers (2023-02-03T21:39:52Z) - Learning Preferences for Interactive Autonomy [1.90365714903665]
This thesis is an attempt towards learning reward functions from human users by using other, more reliable data modalities.
We first propose various forms of comparative feedback, e.g., pairwise comparisons, best-of-many choices, rankings, scaled comparisons; and describe how a robot can use these various forms of human feedback to infer a reward function.
arXiv Detail & Related papers (2022-10-19T21:34:51Z) - Learning Generalizable Robotic Reward Functions from "In-The-Wild" Human
Videos [59.58105314783289]
Domain-agnostic Video Discriminator (DVD) learns multitask reward functions by training a discriminator to classify whether two videos are performing the same task.
DVD can generalize by virtue of learning from a small amount of robot data with a broad dataset of human videos.
DVD can be combined with visual model predictive control to solve robotic manipulation tasks on a real WidowX200 robot in an unseen environment from a single human demo.
arXiv Detail & Related papers (2021-03-31T05:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.