Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning
- URL: http://arxiv.org/abs/2305.10282v1
- Date: Wed, 17 May 2023 15:17:23 GMT
- Title: Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning
- Authors: Gen Li, Wenhao Zhan, Jason D. Lee, Yuejie Chi, Yuxin Chen
- Abstract summary: A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset.
We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL.
The proposed algorithm does not require any reward information during data collection.
- Score: 66.43003402281659
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This paper studies tabular reinforcement learning (RL) in the hybrid setting,
which assumes access to both an offline dataset and online interactions with
the unknown environment. A central question boils down to how to efficiently
utilize online data collection to strengthen and complement the offline dataset
and enable effective policy fine-tuning. Leveraging recent advances in
reward-agnostic exploration and model-based offline RL, we design a three-stage
hybrid RL algorithm that beats the best of both worlds -- pure offline RL and
pure online RL -- in terms of sample complexities. The proposed algorithm does
not require any reward information during data collection. Our theory is
developed based on a new notion called single-policy partial concentrability,
which captures the trade-off between distribution mismatch and miscoverage and
guides the interplay between offline and online data.
Related papers
- Active Advantage-Aligned Online Reinforcement Learning with Offline Data [56.98480620108727]
A3 RL is a novel method that actively selects data from combined online and offline sources to optimize policy improvement.
We provide theoretical guarantee that validates the effectiveness of our active sampling strategy.
arXiv Detail & Related papers (2025-02-11T20:31:59Z) - Efficient Online Reinforcement Learning Fine-Tuning Need Not Retain Offline Data [64.74333980417235]
We show that retaining offline data is unnecessary as long as we use a properly-designed online RL approach for fine-tuning offline RL.
We show that Warm-start RL (WSRL) is able to fine-tune without retaining any offline data, and is able to learn faster and attains higher performance than existing algorithms.
arXiv Detail & Related papers (2024-12-10T18:57:12Z) - Preference Elicitation for Offline Reinforcement Learning [59.136381500967744]
We propose Sim-OPRL, an offline preference-based reinforcement learning algorithm.
Our algorithm employs a pessimistic approach for out-of-distribution data, and an optimistic approach for acquiring informative preferences about the optimal policy.
arXiv Detail & Related papers (2024-06-26T15:59:13Z) - Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees [23.838354396418868]
We propose a new hybrid RL algorithm that combines an on-policy actor-critic method with offline data.
Our approach integrates a procedure of off-policy training on the offline data into an on-policy NPG framework.
arXiv Detail & Related papers (2023-11-14T18:45:56Z) - Bridging Distributionally Robust Learning and Offline RL: An Approach to
Mitigate Distribution Shift and Partial Data Coverage [32.578787778183546]
offline reinforcement learning (RL) algorithms learn optimal polices using historical (offline) data.
One of the main challenges in offline RL is the distribution shift.
We propose two offline RL algorithms using the distributionally robust learning (DRL) framework.
arXiv Detail & Related papers (2023-10-27T19:19:30Z) - Adaptive Policy Learning for Offline-to-Online Reinforcement Learning [27.80266207283246]
We consider an offline-to-online setting where the agent is first learned from the offline dataset and then trained online.
We propose a framework called Adaptive Policy Learning for effectively taking advantage of offline and online data.
arXiv Detail & Related papers (2023-03-14T08:13:21Z) - Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient [42.47810044648846]
We consider a hybrid reinforcement learning setting (Hybrid RL) in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction.
We adapt the classical Q learning/iteration algorithm to the hybrid setting, which we call Hybrid Q-Learning or Hy-Q.
We show that Hy-Q with neural network function approximation outperforms state-of-the-art online, offline, and hybrid RL baselines on challenging benchmarks.
arXiv Detail & Related papers (2022-10-13T04:19:05Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.