Related papers: ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles

URL: http://arxiv.org/abs/2306.06871v4
Date: Sun, 21 Jul 2024 14:49:35 GMT
Title: ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles
Authors: Kai Zhao, Jianye Hao, Yi Ma, Jinyi Liu, Yan Zheng, Zhaopeng Meng,
Abstract summary: We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
Score: 52.34951901588738
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Offline reinforcement learning (RL) is a learning paradigm where an agent learns from a fixed dataset of experience. However, learning solely from a static dataset can limit the performance due to the lack of exploration. To overcome it, offline-to-online RL combines offline pre-training with online fine-tuning, which enables the agent to further refine its policy by interacting with the environment in real-time. Despite its benefits, existing offline-to-online RL methods suffer from performance degradation and slow improvement during the online phase. To tackle these challenges, we propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Moreover, to expedite online performance enhancement, we appropriately loosen the pessimism of Q-value estimation and incorporate ensemble-based exploration mechanisms into our framework. Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods during online fine-tuning on a range of locomotion and navigation tasks, significantly outperforming existing offline-to-online RL methods.

Related papers

Reinforcement Learning with Action Chunking [56.838297900091426]
We present Q-chunking, a recipe for improving reinforcement learning algorithms for long-horizon, sparse-reward tasks.<n>Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning.<n>Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.
arXiv Detail & Related papers (2025-07-10T17:48:03Z)
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning [6.7265073544042995]
We propose Meta Offline-Online Reinforcement Learning (MOORL), a hybrid framework that unifies offline and online learning.<n>Our theoretical analysis demonstrates that the hybrid approach enhances exploration by effectively combining the complementary strengths of offline and online data.<n>With minimal computational overhead, MOORL achieves strong performance, underscoring its potential for practical applications in real-world scenarios.
arXiv Detail & Related papers (2025-06-11T10:12:50Z)
Active Advantage-Aligned Online Reinforcement Learning with Offline Data [56.98480620108727]
A3 RL is a novel method that actively selects data from combined online and offline sources to optimize policy improvement. We provide theoretical guarantee that validates the effectiveness of our active sampling strategy.
arXiv Detail & Related papers (2025-02-11T20:31:59Z)
Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL [36.65926744075032]
offline-to-online (O2O) reinforcement learning improves performance rapidly with limited online interactions. Recent studies often design fine-tuning strategies for a specific offline RL method and cannot perform general O2O learning from any offline method. We propose to handle these two mismatches simultaneously, which aims to achieve general O2O learning from any offline method to any online method.
arXiv Detail & Related papers (2024-12-25T09:52:22Z)
Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data [64.74333980417235]
We show that retaining offline data is unnecessary as long as we use a properly-designed online RL approach for fine-tuning offline RL. We show that Warm-start RL (WSRL) is able to fine-tune without retaining any offline data, and is able to learn faster and attains higher performance than existing algorithms.
arXiv Detail & Related papers (2024-12-10T18:57:12Z)
Unsupervised-to-Online Reinforcement Learning [59.910638327123394]
Unsupervised-to-online RL (U2O RL) replaces domain-specific supervised offline RL with unsupervised offline RL. U2O RL not only enables reusing a single pre-trained model for multiple downstream tasks, but also learns better representations. We empirically demonstrate that U2O RL achieves strong performance that matches or even outperforms previous offline-to-online RL approaches.
arXiv Detail & Related papers (2024-08-27T05:23:45Z)
Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning [8.251711947874238]
offline RL provides a promising solution by giving an offline policy, which can be refined through online interactions. Existing approaches perform offline and online learning in the same task, without considering the task generalization problem in offline-to-online adaptation. Our work builds upon the investigation of successor representations for task generalization in online RL and extends the framework to incorporate offline-to-online learning.
arXiv Detail & Related papers (2024-05-12T08:52:52Z)
Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness [11.241036026084222]
offline-to-online (O2O) RL provides a paradigm for improving an offline trained agent within limited online interactions. Most offline RL algorithms suffer from performance drops and fail to achieve stable policy improvement in O2O adaptation. We propose the Robust Offline-to-Online (RO2O) algorithm, designed to enhance offline policies through uncertainty and smoothness.
arXiv Detail & Related papers (2023-09-29T04:42:50Z)
Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning [66.43003402281659]
A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset. We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL. The proposed algorithm does not require any reward information during data collection.
arXiv Detail & Related papers (2023-05-17T15:17:23Z)
Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs and Practical Solutions [30.050083797177706]
offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment. Online finetuning of such offline models can further improve performance. We show that it is possible to use standard online off-policy algorithms for faster improvement.
arXiv Detail & Related papers (2023-03-30T14:08:31Z)
Adaptive Behavior Cloning Regularization for Stable Offline-to-Online Reinforcement Learning [80.25648265273155]
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment. During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data. We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability. Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.
arXiv Detail & Related papers (2022-10-25T09:08:26Z)
MOORe: Model-based Offline-to-Online Reinforcement Learning [26.10368749930102]
We propose a model-based Offline-to-Online Reinforcement learning (MOORe) algorithm. Experiment results show that our algorithm smoothly transfers from offline to online stages while enabling sample-efficient online adaption.
arXiv Detail & Related papers (2022-01-25T03:14:57Z)
Offline-to-Online Reinforcement Learning via Balanced Replay and Pessimistic Q-Ensemble [135.6115462399788]
Deep offline reinforcement learning has made it possible to train strong robotic agents from offline datasets. State-action distribution shift may lead to severe bootstrap error during fine-tuning. We propose a balanced replay scheme that prioritizes samples encountered online while also encouraging the use of near-on-policy samples.
arXiv Detail & Related papers (2021-07-01T16:26:54Z)
OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning [107.6943868812716]
In many practical applications, the situation is reversed: an agent may have access to large amounts of undirected offline experience data, while access to the online environment is severely limited. Our main insight is that, when presented with offline data composed of a variety of behaviors, an effective way to leverage this data is to extract a continuous space of recurring and temporally extended primitive behaviors. In addition to benefiting offline policy optimization, we show that performing offline primitive learning in this way can also be leveraged for improving few-shot imitation learning.
arXiv Detail & Related papers (2020-10-26T14:31:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.