Skill Preferences: Learning to Extract and Execute Robotic Skills from
Human Feedback
- URL: http://arxiv.org/abs/2108.05382v1
- Date: Wed, 11 Aug 2021 18:04:08 GMT
- Title: Skill Preferences: Learning to Extract and Execute Robotic Skills from
Human Feedback
- Authors: Xiaofei Wang, Kimin Lee, Kourosh Hakhamaneshi, Pieter Abbeel, Michael
Laskin
- Abstract summary: We present Skill Preferences, an algorithm that learns a model over human preferences and uses it to extract human-aligned skills from offline data.
We show that SkiP enables a simulated kitchen robot to solve complex multi-step manipulation tasks.
- Score: 82.96694147237113
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A promising approach to solving challenging long-horizon tasks has been to
extract behavior priors (skills) by fitting generative models to large offline
datasets of demonstrations. However, such generative models inherit the biases
of the underlying data and result in poor and unusable skills when trained on
imperfect demonstration data. To better align skill extraction with human
intent we present Skill Preferences (SkiP), an algorithm that learns a model
over human preferences and uses it to extract human-aligned skills from offline
data. After extracting human-preferred skills, SkiP also utilizes human
feedback to solve down-stream tasks with RL. We show that SkiP enables a
simulated kitchen robot to solve complex multi-step manipulation tasks and
substantially outperforms prior leading RL algorithms with human preferences as
well as leading skill extraction algorithms without human preferences.
Related papers
- EXTRACT: Efficient Policy Learning by Extracting Transferable Robot Skills from Offline Data [22.471559284344462]
Most reinforcement learning (RL) methods focus on learning optimal policies over low-level action spaces.
While these methods can perform well in their training environments, they lack the flexibility to transfer to new tasks.
We demonstrate through experiments in sparse, image-based, robot manipulation environments that can more quickly learn new tasks than prior works.
arXiv Detail & Related papers (2024-06-25T17:50:03Z) - Learning Human Action Recognition Representations Without Real Humans [66.61527869763819]
We present a benchmark that leverages real-world videos with humans removed and synthetic data containing virtual humans to pre-train a model.
We then evaluate the transferability of the representation learned on this data to a diverse set of downstream action recognition benchmarks.
Our approach outperforms previous baselines by up to 5%.
arXiv Detail & Related papers (2023-11-10T18:38:14Z) - On-Robot Bayesian Reinforcement Learning for POMDPs [16.667924736270415]
This paper advances Bayesian reinforcement learning for robotics by proposing a specialized framework for physical systems.
We capture this knowledge in a factored representation, then demonstrate the posterior factorizes in a similar shape, and ultimately formalize the model in a Bayesian framework.
We then introduce a sample-based online solution method, based on Monte-Carlo tree search and particle filtering, specialized to solve the resulting model.
arXiv Detail & Related papers (2023-07-22T01:16:29Z) - Towards A Unified Agent with Foundation Models [18.558328028366816]
We investigate how to embed and leverage such abilities in Reinforcement Learning (RL) agents.
We design a framework that uses language as the core reasoning tool, exploring how this enables an agent to tackle a series of fundamental RL challenges.
We demonstrate substantial performance improvements over baselines in exploration efficiency and ability to reuse data from offline datasets.
arXiv Detail & Related papers (2023-07-18T22:37:30Z) - Optimal Behavior Prior: Data-Efficient Human Models for Improved
Human-AI Collaboration [0.5524804393257919]
We show that using optimal behavior as a prior for human models makes these models vastly more data-efficient.
We also show that using these improved human models often leads to better human-AI collaboration performance.
arXiv Detail & Related papers (2022-11-03T06:10:22Z) - What Matters in Learning from Offline Human Demonstrations for Robot
Manipulation [64.43440450794495]
We conduct an extensive study of six offline learning algorithms for robot manipulation.
Our study analyzes the most critical challenges when learning from offline human data.
We highlight opportunities for learning from human datasets.
arXiv Detail & Related papers (2021-08-06T20:48:30Z) - Hierarchical Few-Shot Imitation with Skill Transition Models [66.81252581083199]
Few-shot Imitation with Skill Transition Models (FIST) is an algorithm that extracts skills from offline data and utilizes them to generalize to unseen tasks.
We show that FIST is capable of generalizing to new tasks and substantially outperforms prior baselines in navigation experiments.
arXiv Detail & Related papers (2021-07-19T15:56:01Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.