Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
- URL: http://arxiv.org/abs/2303.07693v1
- Date: Tue, 14 Mar 2023 08:13:21 GMT
- Title: Adaptive Policy Learning for Offline-to-Online Reinforcement Learning
- Authors: Han Zheng, Xufang Luo, Pengfei Wei, Xuan Song, Dongsheng Li, Jing
Jiang
- Abstract summary: We consider an offline-to-online setting where the agent is first learned from the offline dataset and then trained online.
We propose a framework called Adaptive Policy Learning for effectively taking advantage of offline and online data.
- Score: 27.80266207283246
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Conventional reinforcement learning (RL) needs an environment to collect
fresh data, which is impractical when online interactions are costly. Offline
RL provides an alternative solution by directly learning from the previously
collected dataset. However, it will yield unsatisfactory performance if the
quality of the offline datasets is poor. In this paper, we consider an
offline-to-online setting where the agent is first learned from the offline
dataset and then trained online, and propose a framework called Adaptive Policy
Learning for effectively taking advantage of offline and online data.
Specifically, we explicitly consider the difference between the online and
offline data and apply an adaptive update scheme accordingly, that is, a
pessimistic update strategy for the offline dataset and an optimistic/greedy
update scheme for the online dataset. Such a simple and effective method
provides a way to mix the offline and online RL and achieve the best of both
worlds. We further provide two detailed algorithms for implementing the
framework through embedding value or policy-based RL algorithms into it.
Finally, we conduct extensive experiments on popular continuous control tasks,
and results show that our algorithm can learn the expert policy with high
sample efficiency even when the quality of offline dataset is poor, e.g.,
random dataset.
Related papers
- Small Dataset, Big Gains: Enhancing Reinforcement Learning by Offline
Pre-Training with Model Based Augmentation [59.899714450049494]
offline pre-training can produce sub-optimal policies and lead to degraded online reinforcement learning performance.
We propose a model-based data augmentation strategy to maximize the benefits of offline reinforcement learning pre-training and reduce the scale of data needed to be effective.
arXiv Detail & Related papers (2023-12-15T14:49:41Z) - Beyond Uniform Sampling: Offline Reinforcement Learning with Imbalanced
Datasets [53.8218145723718]
offline policy learning is aimed at learning decision-making policies using existing datasets of trajectories without collecting additional data.
We argue that when a dataset is dominated by suboptimal trajectories, state-of-the-art offline RL algorithms do not substantially improve over the average return of trajectories in the dataset.
We present a realization of the sampling strategy and an algorithm that can be used as a plug-and-play module in standard offline RL algorithms.
arXiv Detail & Related papers (2023-10-06T17:58:14Z) - ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL.
By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance.
Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z) - Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning [66.43003402281659]
A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset.
We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL.
The proposed algorithm does not require any reward information during data collection.
arXiv Detail & Related papers (2023-05-17T15:17:23Z) - Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs
and Practical Solutions [30.050083797177706]
offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment.
Online finetuning of such offline models can further improve performance.
We show that it is possible to use standard online off-policy algorithms for faster improvement.
arXiv Detail & Related papers (2023-03-30T14:08:31Z) - Bridging Imitation and Online Reinforcement Learning: An Optimistic Tale [27.02990488317357]
Given an offline demonstration dataset from an imperfect expert, what is the best way to leverage it to bootstrap online learning performance in MDPs?
We first propose an Informed Posterior Sampling-based RL (iPSRL) algorithm that uses the offline dataset, and information about the expert's behavioral policy used to generate the offline dataset.
Since this algorithm is computationally impractical, we then propose the iRLSVI algorithm that can be seen as a combination of the RLSVI algorithm for online RL, and imitation learning.
arXiv Detail & Related papers (2023-03-20T18:16:25Z) - Efficient Online Reinforcement Learning with Offline Data [78.92501185886569]
We show that we can simply apply existing off-policy methods to leverage offline data when learning online.
We extensively ablate these design choices, demonstrating the key factors that most affect performance.
We see that correct application of these simple recommendations can provide a $mathbf2.5times$ improvement over existing approaches.
arXiv Detail & Related papers (2023-02-06T17:30:22Z) - Benchmarks and Algorithms for Offline Preference-Based Reward Learning [41.676208473752425]
We propose an approach that uses an offline dataset to craft preference queries via pool-based active learning.
Our proposed approach does not require actual physical rollouts or an accurate simulator for either the reward learning or policy optimization steps.
arXiv Detail & Related papers (2023-01-03T23:52:16Z) - Adaptive Behavior Cloning Regularization for Stable Offline-to-Online
Reinforcement Learning [80.25648265273155]
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment.
During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data.
We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability.
Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.
arXiv Detail & Related papers (2022-10-25T09:08:26Z) - MOORe: Model-based Offline-to-Online Reinforcement Learning [26.10368749930102]
We propose a model-based Offline-to-Online Reinforcement learning (MOORe) algorithm.
Experiment results show that our algorithm smoothly transfers from offline to online stages while enabling sample-efficient online adaption.
arXiv Detail & Related papers (2022-01-25T03:14:57Z) - Representation Matters: Offline Pretraining for Sequential Decision
Making [27.74988221252854]
In this paper, we consider a slightly different approach to incorporating offline data into sequential decision-making.
We find that the use of pretraining with unsupervised learning objectives can dramatically improve the performance of policy learning algorithms.
arXiv Detail & Related papers (2021-02-11T02:38:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.