Related papers: An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines

An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines

URL: http://arxiv.org/abs/2512.00383v1
Date: Sat, 29 Nov 2025 08:17:03 GMT
Title: An Empirical Study on the Effectiveness of Incorporating Offline RL As Online RL Subroutines
Authors: Jianhai Su, Jinzhu Luo, Qi Zhang,
Abstract summary: We take the novel perspective of incorporating offline RL algorithms as subroutines of tabula rasa online RL.<n>This is feasible because an online learning agent can repurpose its historical interactions as offline dataset.<n>We formalize this idea into a framework that accommodates several variants of offline RL incorporation.
Score: 8.277534985461477
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We take the novel perspective of incorporating offline RL algorithms as subroutines of tabula rasa online RL. This is feasible because an online learning agent can repurpose its historical interactions as offline dataset. We formalize this idea into a framework that accommodates several variants of offline RL incorporation such as final policy recommendation and online fine-tuning. We further introduce convenient techniques to improve its effectiveness in enhancing online learning efficiency. Our extensive and systematic empirical analyses show that 1) the effectiveness of the proposed framework depends strongly on the nature of the task, 2) our proposed techniques greatly enhance its effectiveness, and 3) existing online fine-tuning methods are overall ineffective, calling for more research therein.

Related papers

Reinforcement Learning with Action Chunking [56.66655947239018]
We present Q-chunking, a recipe for improving reinforcement learning algorithms for long-horizon, sparse-reward tasks.<n>Our recipe is designed for the offline-to-online RL setting, where the goal is to leverage an offline prior dataset to maximize the sample-efficiency of online learning.<n>Our experimental results demonstrate that Q-chunking exhibits strong offline performance and online sample efficiency, outperforming prior best offline-to-online methods on a range of long-horizon, sparse-reward manipulation tasks.
arXiv Detail & Related papers (2025-07-10T17:48:03Z)
Bridging Offline and Online Reinforcement Learning for LLMs [71.48552761763158]
We investigate the effectiveness of reinforcement learning methods for finetuning large language models when transitioning from offline to semi-online.<n>Our experiments cover training on verifiable math as well as non-verifiable instruction following with a set of benchmark evaluations for both.
arXiv Detail & Related papers (2025-06-26T17:25:49Z)
MOORL: A Framework for Integrating Offline-Online Reinforcement Learning [6.7265073544042995]
We propose Meta Offline-Online Reinforcement Learning (MOORL), a hybrid framework that unifies offline and online learning.<n>Our theoretical analysis demonstrates that the hybrid approach enhances exploration by effectively combining the complementary strengths of offline and online data.<n>With minimal computational overhead, MOORL achieves strong performance, underscoring its potential for practical applications in real-world scenarios.
arXiv Detail & Related papers (2025-06-11T10:12:50Z)
Active Advantage-Aligned Online Reinforcement Learning with Offline Data [56.98480620108727]
We introduce A3RL, which incorporates a novel confidence aware Active Advantage Aligned sampling strategy.<n>We demonstrate that our method outperforms competing online RL techniques that leverage offline data.
arXiv Detail & Related papers (2025-02-11T20:31:59Z)
Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL [36.65926744075032]
offline-to-online (O2O) reinforcement learning improves performance rapidly with limited online interactions.<n>Recent studies often design fine-tuning strategies for a specific offline RL method and cannot perform general O2O learning from any offline method.<n>We propose to handle these two mismatches simultaneously, which aims to achieve general O2O learning from any offline method to any online method.
arXiv Detail & Related papers (2024-12-25T09:52:22Z)
Bayesian Design Principles for Offline-to-Online Reinforcement Learning [50.97583504192167]
offline-to-online fine-tuning is crucial for real-world applications where exploration can be costly or unsafe. In this paper, we tackle the dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma.
arXiv Detail & Related papers (2024-05-31T16:31:07Z)
ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL. By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance. Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z)
Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid Reinforcement Learning [66.43003402281659]
A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset. We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL. The proposed algorithm does not require any reward information during data collection.
arXiv Detail & Related papers (2023-05-17T15:17:23Z)
Adaptive Policy Learning for Offline-to-Online Reinforcement Learning [27.80266207283246]
We consider an offline-to-online setting where the agent is first learned from the offline dataset and then trained online. We propose a framework called Adaptive Policy Learning for effectively taking advantage of offline and online data.
arXiv Detail & Related papers (2023-03-14T08:13:21Z)
MOORe: Model-based Offline-to-Online Reinforcement Learning [26.10368749930102]
We propose a model-based Offline-to-Online Reinforcement learning (MOORe) algorithm. Experiment results show that our algorithm smoothly transfers from offline to online stages while enabling sample-efficient online adaption.
arXiv Detail & Related papers (2022-01-25T03:14:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.