The Role of Coverage in Online Reinforcement Learning
- URL: http://arxiv.org/abs/2210.04157v1
- Date: Sun, 9 Oct 2022 03:50:05 GMT
- Title: The Role of Coverage in Online Reinforcement Learning
- Authors: Tengyang Xie, Dylan J. Foster, Yu Bai, Nan Jiang, Sham M. Kakade
- Abstract summary: We show that the mere existence of a data distribution with good coverage can enable sample-efficient online RL.
Existing complexity measures for online RL, including Bellman rank and Bellman-Eluder dimension, fail to optimally capture coverability.
We propose a new complexity measure, the sequential extrapolation coefficient, to provide a unification.
- Score: 72.01066664756986
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Coverage conditions -- which assert that the data logging distribution
adequately covers the state space -- play a fundamental role in determining the
sample complexity of offline reinforcement learning. While such conditions
might seem irrelevant to online reinforcement learning at first glance, we
establish a new connection by showing -- somewhat surprisingly -- that the mere
existence of a data distribution with good coverage can enable sample-efficient
online RL. Concretely, we show that coverability -- that is, existence of a
data distribution that satisfies a ubiquitous coverage condition called
concentrability -- can be viewed as a structural property of the underlying
MDP, and can be exploited by standard algorithms for sample-efficient
exploration, even when the agent does not know said distribution. We complement
this result by proving that several weaker notions of coverage, despite being
sufficient for offline RL, are insufficient for online RL. We also show that
existing complexity measures for online RL, including Bellman rank and
Bellman-Eluder dimension, fail to optimally capture coverability, and propose a
new complexity measure, the sequential extrapolation coefficient, to provide a
unification.
Related papers
- Offline Reinforcement Learning with Additional Covering Distributions [0.0]
We study learning optimal policies from a logged dataset, i.e., offline RL, with function approximation.
We show that sample-efficient offline RL for general MDPs is possible with only a partial coverage dataset and weak realizable function classes.
arXiv Detail & Related papers (2023-05-22T03:31:03Z) - What can online reinforcement learning with function approximation
benefit from general coverage conditions? [53.90873926758026]
In online reinforcement learning (RL), instead of employing standard structural assumptions on Markov decision processes (MDPs), using a certain coverage condition is enough.
In this work, we focus on this new direction by digging more possible and general coverage conditions.
We identify more concepts, including the $Lp$ variant of concentrability, the density ratio realizability, and trade-off on the partial/rest coverage condition.
arXiv Detail & Related papers (2023-04-25T14:57:59Z) - Distributionally Robust Model-Based Offline Reinforcement Learning with
Near-Optimal Sample Complexity [39.886149789339335]
offline reinforcement learning aims to learn to perform decision making from history data without active exploration.
Due to uncertainties and variabilities of the environment, it is critical to learn a robust policy that performs well even when the deployed environment deviates from the nominal one used to collect the history dataset.
We consider a distributionally robust formulation of offline RL, focusing on robust Markov decision processes with an uncertainty set specified by the Kullback-Leibler divergence in both finite-horizon and infinite-horizon settings.
arXiv Detail & Related papers (2022-08-11T11:55:31Z) - Offline Reinforcement Learning: Fundamental Barriers for Value Function
Approximation [74.3002974673248]
We consider the offline reinforcement learning problem, where the aim is to learn a decision making policy from logged data.
offline RL is becoming increasingly relevant in practice, because online data collection is well suited to safety-critical domains.
Our results show that sample-efficient offline reinforcement learning requires either restrictive coverage conditions or representation conditions that go beyond complexity learning.
arXiv Detail & Related papers (2021-11-21T23:22:37Z) - Sample-Efficient Reinforcement Learning Is Feasible for Linearly
Realizable MDPs with Limited Revisiting [60.98700344526674]
Low-complexity models such as linear function representation play a pivotal role in enabling sample-efficient reinforcement learning.
In this paper, we investigate a new sampling protocol, which draws samples in an online/exploratory fashion but allows one to backtrack and revisit previous states in a controlled and infrequent manner.
We develop an algorithm tailored to this setting, achieving a sample complexity that scales practicallyly with the feature dimension, the horizon, and the inverse sub-optimality gap, but not the size of the state/action space.
arXiv Detail & Related papers (2021-05-17T17:22:07Z) - Instabilities of Offline RL with Pre-Trained Neural Representation [127.89397629569808]
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn) policies in scenarios where the data are collected from a distribution that substantially differs from that of the target policy to be evaluated.
Recent theoretical advances have shown that such sample-efficient offline RL is indeed possible provided certain strong representational conditions hold.
This work studies these issues from an empirical perspective to gauge how stable offline RL methods are.
arXiv Detail & Related papers (2021-03-08T18:06:44Z) - What are the Statistical Limits of Offline RL with Linear Function
Approximation? [70.33301077240763]
offline reinforcement learning seeks to utilize offline (observational) data to guide the learning of sequential decision making strategies.
This work focuses on the basic question of what are necessary representational and distributional conditions that permit provable sample-efficient offline reinforcement learning.
arXiv Detail & Related papers (2020-10-22T17:32:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.