Learning a Universal Human Prior for Dexterous Manipulation from Human
Preference
- URL: http://arxiv.org/abs/2304.04602v2
- Date: Wed, 13 Sep 2023 06:19:35 GMT
- Title: Learning a Universal Human Prior for Dexterous Manipulation from Human
Preference
- Authors: Zihan Ding, Yuanpei Chen, Allen Z. Ren, Shixiang Shane Gu, Qianxu
Wang, Hao Dong, Chi Jin
- Abstract summary: We propose a framework that learns a universal human prior using direct human preference feedback over videos.
A task-agnostic reward model is trained through iteratively generating diverse polices and collecting human preference over the trajectories.
Our method empirically demonstrates more human-like behaviors on robot hands in diverse tasks including even unseen tasks.
- Score: 35.54663426598218
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating human-like behavior on robots is a great challenge especially in
dexterous manipulation tasks with robotic hands. Scripting policies from
scratch is intractable due to the high-dimensional control space, and training
policies with reinforcement learning (RL) and manual reward engineering can
also be hard and lead to unnatural motions. Leveraging the recent progress on
RL from Human Feedback, we propose a framework that learns a universal human
prior using direct human preference feedback over videos, for efficiently
tuning the RL policies on 20 dual-hand robot manipulation tasks in simulation,
without a single human demonstration. A task-agnostic reward model is trained
through iteratively generating diverse polices and collecting human preference
over the trajectories; it is then applied for regularizing the behavior of
polices in the fine-tuning stage. Our method empirically demonstrates more
human-like behaviors on robot hands in diverse tasks including even unseen
tasks, indicating its generalization capability.
Related papers
- Learning Video-Conditioned Policies for Unseen Manipulation Tasks [83.2240629060453]
Video-conditioned Policy learning maps human demonstrations of previously unseen tasks to robot manipulation skills.
We learn our policy to generate appropriate actions given current scene observations and a video of the target task.
We validate our approach on a set of challenging multi-task robot manipulation environments and outperform state of the art.
arXiv Detail & Related papers (2023-05-10T16:25:42Z) - Learning Human-to-Robot Handovers from Point Clouds [63.18127198174958]
We propose the first framework to learn control policies for vision-based human-to-robot handovers.
We show significant performance gains over baselines on a simulation benchmark, sim-to-sim transfer and sim-to-real transfer.
arXiv Detail & Related papers (2023-03-30T17:58:36Z) - Zero-Shot Robot Manipulation from Passive Human Videos [59.193076151832145]
We develop a framework for extracting agent-agnostic action representations from human videos.
Our framework is based on predicting plausible human hand trajectories.
We deploy the trained model zero-shot for physical robot manipulation tasks.
arXiv Detail & Related papers (2023-02-03T21:39:52Z) - HERD: Continuous Human-to-Robot Evolution for Learning from Human
Demonstration [57.045140028275036]
We show that manipulation skills can be transferred from a human to a robot through the use of micro-evolutionary reinforcement learning.
We propose an algorithm for multi-dimensional evolution path searching that allows joint optimization of both the robot evolution path and the policy.
arXiv Detail & Related papers (2022-12-08T15:56:13Z) - Learning Reward Functions for Robotic Manipulation by Observing Humans [92.30657414416527]
We use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies.
The learned rewards are based on distances to a goal in an embedding space learned using a time-contrastive objective.
arXiv Detail & Related papers (2022-11-16T16:26:48Z) - Co-GAIL: Learning Diverse Strategies for Human-Robot Collaboration [51.268988527778276]
We present a method for learning a human-robot collaboration policy from human-human collaboration demonstrations.
Our method co-optimizes a human policy and a robot policy in an interactive learning process.
arXiv Detail & Related papers (2021-08-13T03:14:43Z) - On the Emergence of Whole-body Strategies from Humanoid Robot
Push-recovery Learning [32.070068456106895]
We apply model-free Deep Reinforcement Learning for training a general and robust humanoid push-recovery policy in a simulation environment.
Our method targets high-dimensional whole-body humanoid control and is validated on the iCub humanoid.
arXiv Detail & Related papers (2021-04-29T17:49:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.