A Simple Unified Uncertainty-Guided Framework for Offline-to-Online
Reinforcement Learning
- URL: http://arxiv.org/abs/2306.07541v2
- Date: Wed, 21 Feb 2024 03:07:23 GMT
- Title: A Simple Unified Uncertainty-Guided Framework for Offline-to-Online
Reinforcement Learning
- Authors: Siyuan Guo, Yanchao Sun, Jifeng Hu, Sili Huang, Hechang Chen, Haiyin
Piao, Lichao Sun, Yi Chang
- Abstract summary: offline-to-online reinforcement learning can be challenging due to constrained exploratory behavior and state-action distribution shift.
We propose a Simple Unified uNcertainty-Guided (SUNG) framework, which unifies the solution to both challenges with the tool of uncertainty.
SUNG achieves state-of-the-art online finetuning performance when combined with different offline RL methods.
- Score: 25.123237633748193
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline reinforcement learning (RL) provides a promising solution to learning
an agent fully relying on a data-driven paradigm. However, constrained by the
limited quality of the offline dataset, its performance is often sub-optimal.
Therefore, it is desired to further finetune the agent via extra online
interactions before deployment. Unfortunately, offline-to-online RL can be
challenging due to two main challenges: constrained exploratory behavior and
state-action distribution shift. To this end, we propose a Simple Unified
uNcertainty-Guided (SUNG) framework, which naturally unifies the solution to
both challenges with the tool of uncertainty. Specifically, SUNG quantifies
uncertainty via a VAE-based state-action visitation density estimator. To
facilitate efficient exploration, SUNG presents a practical optimistic
exploration strategy to select informative actions with both high value and
high uncertainty. Moreover, SUNG develops an adaptive exploitation method by
applying conservative offline RL objectives to high-uncertainty samples and
standard online RL objectives to low-uncertainty samples to smoothly bridge
offline and online stages. SUNG achieves state-of-the-art online finetuning
performance when combined with different offline RL methods, across various
environments and datasets in D4RL benchmark.
Related papers
- Offline-to-Online Multi-Agent Reinforcement Learning with Offline Value Function Memory and Sequential Exploration [40.346958259814514]
We propose a novel O2O MARL framework called Offline Value Function Memory with Sequential Exploration (OVMSE)
First, we introduce the Offline Value Function Memory (OVM) mechanism to compute target Q-values, preserving knowledge gained during offline training.
Second, we propose a decentralized Sequential Exploration (SE) strategy tailored for O2O MARL, which effectively utilizes the pre-trained offline policy for exploration.
arXiv Detail & Related papers (2024-10-25T10:24:19Z) - D5RL: Diverse Datasets for Data-Driven Deep Reinforcement Learning [99.33607114541861]
We propose a new benchmark for offline RL that focuses on realistic simulations of robotic manipulation and locomotion environments.
Our proposed benchmark covers state-based and image-based domains, and supports both offline RL and online fine-tuning evaluation.
arXiv Detail & Related papers (2024-08-15T22:27:00Z) - Bridging Distributionally Robust Learning and Offline RL: An Approach to
Mitigate Distribution Shift and Partial Data Coverage [32.578787778183546]
offline reinforcement learning (RL) algorithms learn optimal polices using historical (offline) data.
One of the main challenges in offline RL is the distribution shift.
We propose two offline RL algorithms using the distributionally robust learning (DRL) framework.
arXiv Detail & Related papers (2023-10-27T19:19:30Z) - Train Once, Get a Family: State-Adaptive Balances for Offline-to-Online
Reinforcement Learning [71.02384943570372]
Family Offline-to-Online RL (FamO2O) is a framework that empowers existing algorithms to determine state-adaptive improvement-constraint balances.
FamO2O offers a statistically significant improvement over various existing methods, achieving state-of-the-art performance on the D4RL benchmark.
arXiv Detail & Related papers (2023-10-27T08:30:54Z) - Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty
and Smoothness [11.903893267037061]
offline-to-online (O2O) RL provides a paradigm for improving an offline trained agent within limited online interactions.
Most offline RL algorithms suffer from performance drops and fail to achieve stable policy improvement in O2O adaptation.
We propose the Robust Offline-to-Online (RO2O) algorithm, designed to enhance offline policies through uncertainty and smoothness.
arXiv Detail & Related papers (2023-09-29T04:42:50Z) - ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL.
By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance.
Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z) - Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning [66.43003402281659]
A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset.
We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL.
The proposed algorithm does not require any reward information during data collection.
arXiv Detail & Related papers (2023-05-17T15:17:23Z) - DARA: Dynamics-Aware Reward Augmentation in Offline Reinforcement
Learning [17.664027379555183]
offline reinforcement learning algorithms promise to be applicable in settings where a fixed dataset is available and no new experience can be acquired.
This paper formulates the offline dynamics adaptation by using (source) offline data collected from another dynamics to relax the requirement for the extensive (target) offline data.
With only modest amounts of target offline data, our performance consistently outperforms the prior offline RL methods in both simulated and real-world tasks.
arXiv Detail & Related papers (2022-03-13T14:30:55Z) - Pessimistic Model Selection for Offline Deep Reinforcement Learning [56.282483586473816]
Deep Reinforcement Learning (DRL) has demonstrated great potentials in solving sequential decision making problems in many applications.
One main barrier is the over-fitting issue that leads to poor generalizability of the policy learned by DRL.
We propose a pessimistic model selection (PMS) approach for offline DRL with a theoretical guarantee.
arXiv Detail & Related papers (2021-11-29T06:29:49Z) - Offline Reinforcement Learning: Fundamental Barriers for Value Function
Approximation [74.3002974673248]
We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data.
offline RL is becoming increasingly relevant in practice, because online data collection is well suited to safety-critical domains.
Our results show that sample-efficient offline reinforcement learning requires either restrictive coverage conditions or representation conditions that go beyond complexity learning.
arXiv Detail & Related papers (2021-11-21T23:22:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.