Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning
- URL: http://arxiv.org/abs/2305.10282v1
- Date: Wed, 17 May 2023 15:17:23 GMT
- Title: Reward-agnostic Fine-tuning: Provable Statistical Benefits of Hybrid
Reinforcement Learning
- Authors: Gen Li, Wenhao Zhan, Jason D. Lee, Yuejie Chi, Yuxin Chen
- Abstract summary: A central question boils down to how to efficiently utilize online data collection to strengthen and complement the offline dataset.
We design a three-stage hybrid RL algorithm that beats the best of both worlds -- pure offline RL and pure online RL.
The proposed algorithm does not require any reward information during data collection.
- Score: 66.43003402281659
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: This paper studies tabular reinforcement learning (RL) in the hybrid setting,
which assumes access to both an offline dataset and online interactions with
the unknown environment. A central question boils down to how to efficiently
utilize online data collection to strengthen and complement the offline dataset
and enable effective policy fine-tuning. Leveraging recent advances in
reward-agnostic exploration and model-based offline RL, we design a three-stage
hybrid RL algorithm that beats the best of both worlds -- pure offline RL and
pure online RL -- in terms of sample complexities. The proposed algorithm does
not require any reward information during data collection. Our theory is
developed based on a new notion called single-policy partial concentrability,
which captures the trade-off between distribution mismatch and miscoverage and
guides the interplay between offline and online data.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.