What can online reinforcement learning with function approximation
benefit from general coverage conditions?
- URL: http://arxiv.org/abs/2304.12886v2
- Date: Wed, 31 May 2023 15:48:23 GMT
- Title: What can online reinforcement learning with function approximation
benefit from general coverage conditions?
- Authors: Fanghui Liu, Luca Viano, Volkan Cevher
- Abstract summary: In online reinforcement learning (RL), instead of employing standard structural assumptions on Markov decision processes (MDPs), using a certain coverage condition is enough.
In this work, we focus on this new direction by digging more possible and general coverage conditions.
We identify more concepts, including the $Lp$ variant of concentrability, the density ratio realizability, and trade-off on the partial/rest coverage condition.
- Score: 53.90873926758026
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In online reinforcement learning (RL), instead of employing standard
structural assumptions on Markov decision processes (MDPs), using a certain
coverage condition (original from offline RL) is enough to ensure
sample-efficient guarantees (Xie et al. 2023). In this work, we focus on this
new direction by digging more possible and general coverage conditions, and
study the potential and the utility of them in efficient online RL. We identify
more concepts, including the $L^p$ variant of concentrability, the density
ratio realizability, and trade-off on the partial/rest coverage condition, that
can be also beneficial to sample-efficient online RL, achieving improved regret
bound. Furthermore, if exploratory offline data are used, under our coverage
conditions, both statistically and computationally efficient guarantees can be
achieved for online RL. Besides, even though the MDP structure is given, e.g.,
linear MDP, we elucidate that, good coverage conditions are still beneficial to
obtain faster regret bound beyond $\widetilde{O}(\sqrt{T})$ and even a
logarithmic order regret. These results provide a good justification for the
usage of general coverage conditions in efficient online RL.
Related papers
- The Importance of Online Data: Understanding Preference Fine-tuning via Coverage [25.782644676250115]
We study the similarities and differences between online and offline techniques for preference fine-tuning.
We prove that a global coverage condition is both necessary and sufficient for offline contrastive methods to converge to the optimal policy.
We derive a hybrid preference optimization algorithm that uses offline data for contrastive-based preference optimization and online data for KL regularization.
arXiv Detail & Related papers (2024-06-03T15:51:04Z) - Harnessing Density Ratios for Online Reinforcement Learning [35.268369362811676]
density ratio-based algorithms have online counterparts.
New algorithm (GLOW) uses density ratio realizability and value function realizability to perform sample-efficient online exploration.
arXiv Detail & Related papers (2024-01-18T02:21:06Z) - The Provable Benefits of Unsupervised Data Sharing for Offline
Reinforcement Learning [25.647624787936028]
We propose a novel, Provable Data Sharing algorithm (PDS) to utilize reward-free data for offline reinforcement learning.
PDS significantly improves the performance of offline RL algorithms with reward-free data.
arXiv Detail & Related papers (2023-02-27T03:35:02Z) - The Role of Coverage in Online Reinforcement Learning [72.01066664756986]
We show that the mere existence of a data distribution with good coverage can enable sample-efficient online RL.
Existing complexity measures for online RL, including Bellman rank and Bellman-Eluder dimension, fail to optimally capture coverability.
We propose a new complexity measure, the sequential extrapolation coefficient, to provide a unification.
arXiv Detail & Related papers (2022-10-09T03:50:05Z) - RORL: Robust Offline Reinforcement Learning via Conservative Smoothing [72.8062448549897]
offline reinforcement learning can exploit the massive amount of offline data for complex decision-making tasks.
Current offline RL algorithms are generally designed to be conservative for value estimation and action selection.
We propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique.
arXiv Detail & Related papers (2022-06-06T18:07:41Z) - Offline Reinforcement Learning: Fundamental Barriers for Value Function
Approximation [74.3002974673248]
We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data.
offline RL is becoming increasingly relevant in practice, because online data collection is well suited to safety-critical domains.
Our results show that sample-efficient offline reinforcement learning requires either restrictive coverage conditions or representation conditions that go beyond complexity learning.
arXiv Detail & Related papers (2021-11-21T23:22:37Z) - Sample Efficient Reinforcement Learning In Continuous State Spaces: A
Perspective Beyond Linearity [50.38337893712897]
We introduce the Effective Planning Window (EPW) condition, a structural condition on MDPs that makes no linearity assumptions.
We demonstrate that the EPW condition permits sample efficient RL, by providing an algorithm which provably solves MDPs satisfying this condition.
We additionally show the necessity of conditions like EPW, by demonstrating that simple MDPs with slight nonlinearities cannot be solved sample efficiently.
arXiv Detail & Related papers (2021-06-15T00:06:59Z) - Instabilities of Offline RL with Pre-Trained Neural Representation [127.89397629569808]
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated.
Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold.
This work studies these issues from an empirical perspective to gauge how stable offline RL methods are.
arXiv Detail & Related papers (2021-03-08T18:06:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.