Online Pre-Training for Offline-to-Online Reinforcement Learning
- URL: http://arxiv.org/abs/2507.08387v1
- Date: Fri, 11 Jul 2025 08:00:12 GMT
- Title: Online Pre-Training for Offline-to-Online Reinforcement Learning
- Authors: Yongjae Shin, Jeonghye Kim, Whiyoung Jung, Sunghoon Hong, Deunsol Yoon, Youngsoo Jang, Geonhyeong Kim, Jongseong Chae, Youngchul Sung, Kanghoon Lee, Woohyung Lim,
- Abstract summary: We propose Online Pre-Training for Offline-to-Online RL (OPT) to address the issue of inaccurate value estimation in offline pre-trained agents.<n>OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning.
- Score: 21.146400629843015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline-to-online reinforcement learning (RL) aims to integrate the complementary strengths of offline and online RL by pre-training an agent offline and subsequently fine-tuning it through online interactions. However, recent studies reveal that offline pre-trained agents often underperform during online fine-tuning due to inaccurate value estimation caused by distribution shift, with random initialization proving more effective in certain cases. In this work, we propose a novel method, Online Pre-Training for Offline-to-Online RL (OPT), explicitly designed to address the issue of inaccurate value estimation in offline pre-trained agents. OPT introduces a new learning phase, Online Pre-Training, which allows the training of a new value function tailored specifically for effective online fine-tuning. Implementation of OPT on TD3 and SPOT demonstrates an average 30% improvement in performance across a wide range of D4RL environments, including MuJoCo, Antmaze, and Adroit.
Related papers
- Efficient Online RL Fine Tuning with Offline Pre-trained Policy Only [22.94253602450729]
Existing online reinforcement learning (RL) fine-tuning methods require continued training with offline pretrained Q-functions for stability and performance.<n>We propose a method for efficient online RL fine-tuning using solely the offline pre-trained policy.<n>We introduce PORL (Policy-Only Reinforcement Learning Fine-Tuning), which rapidly initializes the Q-function from scratch during the online phase to avoid detrimental pessimism.
arXiv Detail & Related papers (2025-05-22T16:14:08Z) - Optimistic Critic Reconstruction and Constrained Fine-Tuning for General Offline-to-Online RL [36.65926744075032]
offline-to-online (O2O) reinforcement learning improves performance rapidly with limited online interactions.<n>Recent studies often design fine-tuning strategies for a specific offline RL method and cannot perform general O2O learning from any offline method.<n>We propose to handle these two mismatches simultaneously, which aims to achieve general O2O learning from any offline method to any online method.
arXiv Detail & Related papers (2024-12-25T09:52:22Z) - Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data [64.74333980417235]
We show that retaining offline data is unnecessary as long as we use a properly-designed online RL approach for fine-tuning offline RL.<n>We show that Warm-start RL (WSRL) is able to fine-tune without retaining any offline data, and is able to learn faster and attains higher performance than existing algorithms.
arXiv Detail & Related papers (2024-12-10T18:57:12Z) - Unsupervised-to-Online Reinforcement Learning [59.910638327123394]
Unsupervised-to-online RL (U2O RL) replaces domain-specific supervised offline RL with unsupervised offline RL.
U2O RL not only enables reusing a single pre-trained model for multiple downstream tasks, but also learns better representations.
We empirically demonstrate that U2O RL achieves strong performance that matches or even outperforms previous offline-to-online RL approaches.
arXiv Detail & Related papers (2024-08-27T05:23:45Z) - ENOTO: Improving Offline-to-Online Reinforcement Learning with Q-Ensembles [52.34951901588738]
We propose a novel framework called ENsemble-based Offline-To-Online (ENOTO) RL.
By increasing the number of Q-networks, we seamlessly bridge offline pre-training and online fine-tuning without degrading performance.
Experimental results demonstrate that ENOTO can substantially improve the training stability, learning efficiency, and final performance of existing offline RL methods.
arXiv Detail & Related papers (2023-06-12T05:10:10Z) - Finetuning from Offline Reinforcement Learning: Challenges, Trade-offs
and Practical Solutions [30.050083797177706]
offline reinforcement learning (RL) allows for the training of competent agents from offline datasets without any interaction with the environment.
Online finetuning of such offline models can further improve performance.
We show that it is possible to use standard online off-policy algorithms for faster improvement.
arXiv Detail & Related papers (2023-03-30T14:08:31Z) - Adaptive Behavior Cloning Regularization for Stable Offline-to-Online
Reinforcement Learning [80.25648265273155]
Offline reinforcement learning, by learning from a fixed dataset, makes it possible to learn agent behaviors without interacting with the environment.
During online fine-tuning, the performance of the pre-trained agent may collapse quickly due to the sudden distribution shift from offline to online data.
We propose to adaptively weigh the behavior cloning loss during online fine-tuning based on the agent's performance and training stability.
Experiments show that the proposed method yields state-of-the-art offline-to-online reinforcement learning performance on the popular D4RL benchmark.
arXiv Detail & Related papers (2022-10-25T09:08:26Z) - Recursive Least-Squares Estimator-Aided Online Learning for Visual
Tracking [58.14267480293575]
We propose a simple yet effective online learning approach for few-shot online adaptation without requiring offline training.
It allows an in-built memory retention mechanism for the model to remember the knowledge about the object seen before.
We evaluate our approach based on two networks in the online learning families for tracking, i.e., multi-layer perceptrons in RT-MDNet and convolutional neural networks in DiMP.
arXiv Detail & Related papers (2021-12-28T06:51:18Z) - Offline-to-Online Reinforcement Learning via Balanced Replay and
Pessimistic Q-Ensemble [135.6115462399788]
Deep offline reinforcement learning has made it possible to train strong robotic agents from offline datasets.
State-action distribution shift may lead to severe bootstrap error during fine-tuning.
We propose a balanced replay scheme that prioritizes samples encountered online while also encouraging the use of near-on-policy samples.
arXiv Detail & Related papers (2021-07-01T16:26:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.