AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
- URL: http://arxiv.org/abs/2006.09359v6
- Date: Sat, 24 Apr 2021 22:39:30 GMT
- Title: AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
- Authors: Ashvin Nair, Abhishek Gupta, Murtaza Dalal, Sergey Levine
- Abstract summary: We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
- Score: 84.94748183816547
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning (RL) provides an appealing formalism for learning
control policies from experience. However, the classic active formulation of RL
necessitates a lengthy active exploration process for each behavior, making it
difficult to apply in real-world settings such as robotic control. If we can
instead allow RL algorithms to effectively use previously collected data to aid
the online learning process, such applications could be made substantially more
practical: the prior data would provide a starting point that mitigates
challenges due to exploration and sample complexity, while the online training
enables the agent to perfect the desired skill. Such prior data could either
constitute expert demonstrations or sub-optimal prior data that illustrates
potentially useful transitions. While a number of prior methods have either
used optimal demonstrations to bootstrap RL, or have used sub-optimal data to
train purely offline, it remains exceptionally difficult to train a policy with
offline data and actually continue to improve it further with online RL. In
this paper we analyze why this problem is so challenging, and propose an
algorithm that combines sample efficient dynamic programming with maximum
likelihood policy updates, providing a simple and effective framework that is
able to leverage large amounts of offline data and then quickly perform online
fine-tuning of RL policies. We show that our method, advantage weighted actor
critic (AWAC), enables rapid learning of skills with a combination of prior
demonstration data and online experience. We demonstrate these benefits on
simulated and real-world robotics domains, including dexterous manipulation
with a real multi-fingered hand, drawer opening with a robotic arm, and
rotating a valve. Our results show that incorporating prior data can reduce the
time required to learn a range of robotic skills to practical time-scales.
Related papers
- D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Finetuning Offline World Models in the Real World [13.46766121896684]
Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult.
offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction.
In this work, we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model.
arXiv Detail & Related papers (2023-10-24T17:46:12Z) - Benchmarking Offline Reinforcement Learning on Real-Robot Hardware [35.29390454207064]
Dexterous manipulation in particular remains an open problem in its general form.
We propose a benchmark including a large collection of data for offline learning from a dexterous manipulation platform on two tasks.
We evaluate prominent open-sourced offline reinforcement learning algorithms on the datasets and provide a reproducible experimental setup for offline reinforcement learning on real systems.
arXiv Detail & Related papers (2023-07-28T17:29:49Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Efficient Robotic Manipulation Through Offline-to-Online Reinforcement
Learning and Goal-Aware State Information [5.604859261995801]
We propose a unified offline-to-online RL framework that resolves the transition performance drop issue.
We introduce goal-aware state information to the RL agent, which can greatly reduce task complexity and accelerate policy learning.
Our framework achieves great training efficiency and performance compared with the state-of-the-art methods in multiple robotic manipulation tasks.
arXiv Detail & Related papers (2021-10-21T05:34:25Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.