SQUIRL: Robust and Efficient Learning from Video Demonstration of
Long-Horizon Robotic Manipulation Tasks
- URL: http://arxiv.org/abs/2003.04956v1
- Date: Tue, 10 Mar 2020 20:26:26 GMT
- Title: SQUIRL: Robust and Efficient Learning from Video Demonstration of
Long-Horizon Robotic Manipulation Tasks
- Authors: Bohan Wu, Feng Xu, Zhanpeng He, Abhi Gupta, and Peter K. Allen
- Abstract summary: Deep reinforcement learning (RL) can be used to learn complex manipulation tasks.
RL requires the robot to collect a large amount of real-world experience.
S SQUIRL performs a new but related long-horizon task robustly given only a single video demonstration.
- Score: 8.756012472587601
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent advances in deep reinforcement learning (RL) have demonstrated its
potential to learn complex robotic manipulation tasks. However, RL still
requires the robot to collect a large amount of real-world experience. To
address this problem, recent works have proposed learning from expert
demonstrations (LfD), particularly via inverse reinforcement learning (IRL),
given its ability to achieve robust performance with only a small number of
expert demonstrations. Nevertheless, deploying IRL on real robots is still
challenging due to the large number of robot experiences it requires. This
paper aims to address this scalability challenge with a robust,
sample-efficient, and general meta-IRL algorithm, SQUIRL, that performs a new
but related long-horizon task robustly given only a single video demonstration.
First, this algorithm bootstraps the learning of a task encoder and a
task-conditioned policy using behavioral cloning (BC). It then collects
real-robot experiences and bypasses reward learning by directly recovering a
Q-function from the combined robot and expert trajectories. Next, this
algorithm uses the Q-function to re-evaluate all cumulative experiences
collected by the robot to improve the policy quickly. In the end, the policy
performs more robustly (90%+ success) than BC on new tasks while requiring no
trial-and-errors at test time. Finally, our real-robot and simulated
experiments demonstrate our algorithm's generality across different state
spaces, action spaces, and vision-based manipulation tasks, e.g.,
pick-pour-place and pick-carry-drop.
Related papers
- SPIRE: Synergistic Planning, Imitation, and Reinforcement Learning for Long-Horizon Manipulation [58.14969377419633]
We propose spire, a system that decomposes tasks into smaller learning subproblems and second combines imitation and reinforcement learning to maximize their strengths.
We find that spire outperforms prior approaches that integrate imitation learning, reinforcement learning, and planning by 35% to 50% in average task performance.
arXiv Detail & Related papers (2024-10-23T17:42:07Z) - Offline Imitation Learning Through Graph Search and Retrieval [57.57306578140857]
Imitation learning is a powerful machine learning algorithm for a robot to acquire manipulation skills.
We propose GSR, a simple yet effective algorithm that learns from suboptimal demonstrations through Graph Search and Retrieval.
GSR can achieve a 10% to 30% higher success rate and over 30% higher proficiency compared to baselines.
arXiv Detail & Related papers (2024-07-22T06:12:21Z) - Self-Improving Robots: End-to-End Autonomous Visuomotor Reinforcement
Learning [54.636562516974884]
In imitation and reinforcement learning, the cost of human supervision limits the amount of data that robots can be trained on.
In this work, we propose MEDAL++, a novel design for self-improving robotic systems.
The robot autonomously practices the task by learning to both do and undo the task, simultaneously inferring the reward function from the demonstrations.
arXiv Detail & Related papers (2023-03-02T18:51:38Z) - Accelerating Robot Learning of Contact-Rich Manipulations: A Curriculum
Learning Study [4.045850174820418]
This paper presents a study for accelerating robot learning of contact-rich manipulation tasks based on Curriculum Learning combined with Domain Randomization (DR)
We tackle complex industrial assembly tasks with position-controlled robots, such as insertion tasks.
Results also show that even when training only in simulation with toy tasks, our method can learn policies that can be transferred to the real-world robot.
arXiv Detail & Related papers (2022-04-27T11:08:39Z) - Accelerating Robotic Reinforcement Learning via Parameterized Action
Primitives [92.0321404272942]
Reinforcement learning can be used to build general-purpose robotic systems.
However, training RL agents to solve robotics tasks still remains challenging.
In this work, we manually specify a library of robot action primitives (RAPS), parameterized with arguments that are learned by an RL policy.
We find that our simple change to the action interface substantially improves both the learning efficiency and task performance.
arXiv Detail & Related papers (2021-10-28T17:59:30Z) - Lifelong Robotic Reinforcement Learning by Retaining Experiences [61.79346922421323]
Many multi-task reinforcement learning efforts assume the robot can collect data from all tasks at all times.
In this work, we study a practical sequential multi-task RL problem motivated by the practical constraints of physical robotic systems.
We derive an approach that effectively leverages the data and policies learned for previous tasks to cumulatively grow the robot's skill-set.
arXiv Detail & Related papers (2021-09-19T18:00:51Z) - CRIL: Continual Robot Imitation Learning via Generative and Prediction
Model [8.896427780114703]
We study how to realize continual imitation learning ability that empowers robots to continually learn new tasks one by one.
We propose a novel trajectory generation model that employs both a generative adversarial network and a dynamics prediction model.
Our experiments on both simulation and real world manipulation tasks demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2021-06-17T12:15:57Z) - A Framework for Efficient Robotic Manipulation [79.10407063260473]
We show that a single robotic arm can learn sparse-reward manipulation policies from pixels.
We show that, given only 10 demonstrations, a single robotic arm can learn sparse-reward manipulation policies from pixels.
arXiv Detail & Related papers (2020-12-14T22:18:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.