When is Agnostic Reinforcement Learning Statistically Tractable?
- URL: http://arxiv.org/abs/2310.06113v1
- Date: Mon, 9 Oct 2023 19:40:54 GMT
- Title: When is Agnostic Reinforcement Learning Statistically Tractable?
- Authors: Zeyu Jia, Gene Li, Alexander Rakhlin, Ayush Sekhari, Nathan Srebro
- Abstract summary: A new complexity measure, called the emphspanning capacity, depends solely on the set $Pi$ and is independent of the MDP dynamics.
We show there exists a policy class $Pi$ with a bounded spanning capacity that requires a superpolynomial number of samples to learn.
This reveals a surprising separation for learnability between generative access and online access models.
- Score: 76.1408672715773
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of agnostic PAC reinforcement learning (RL): given a
policy class $\Pi$, how many rounds of interaction with an unknown MDP (with a
potentially large state and action space) are required to learn an
$\epsilon$-suboptimal policy with respect to $\Pi$? Towards that end, we
introduce a new complexity measure, called the \emph{spanning capacity}, that
depends solely on the set $\Pi$ and is independent of the MDP dynamics. With a
generative model, we show that for any policy class $\Pi$, bounded spanning
capacity characterizes PAC learnability. However, for online RL, the situation
is more subtle. We show there exists a policy class $\Pi$ with a bounded
spanning capacity that requires a superpolynomial number of samples to learn.
This reveals a surprising separation for agnostic learnability between
generative access and online access models (as well as between
deterministic/stochastic MDPs under online access). On the positive side, we
identify an additional \emph{sunflower} structure, which in conjunction with
bounded spanning capacity enables statistically efficient online RL via a new
algorithm called POPLER, which takes inspiration from classical importance
sampling methods as well as techniques for reachable-state identification and
policy evaluation in reward-free exploration.
Related papers
- Online Policy Learning and Inference by Matrix Completion [12.527541242185404]
We formulate the problem as a matrix completion bandit (MCB)
We explore the $epsilon$-greedy bandit and the online gradient descent descent.
A faster decaying exploration yields smaller regret but learns the optimal policy less accurately.
arXiv Detail & Related papers (2024-04-26T13:19:27Z) - Low-Rank MDPs with Continuous Action Spaces [42.695778474071254]
We study the problem of extending such methods to settings with continuous actions.
We show that, without any modifications to the algorithm, we obtain a similar PAC bound when actions are allowed to be continuous.
arXiv Detail & Related papers (2023-11-06T22:05:08Z) - Offline RL With Realistic Datasets: Heteroskedasticity and Support
Constraints [82.43359506154117]
We show that typical offline reinforcement learning methods fail to learn from data with non-uniform variability.
Our method is simple, theoretically motivated, and improves performance across a wide range of offline RL problems in Atari games, navigation, and pixel-based manipulation.
arXiv Detail & Related papers (2022-11-02T11:36:06Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Near Instance-Optimal PAC Reinforcement Learning for Deterministic MDPs [24.256960622176305]
We propose the first (nearly) matching upper and lower bounds on the sample complexity of PAC RL in episodic Markov decision processes.
Our bounds feature a new notion of sub-optimality gap for state-action pairs that we call the deterministic return gap.
Their design and analyses employ novel ideas, including graph-theoretical concepts such as minimum flows and maximum cuts.
arXiv Detail & Related papers (2022-03-17T11:19:41Z) - Reward-Free RL is No Harder Than Reward-Aware RL in Linear Markov
Decision Processes [61.11090361892306]
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration.
We show that this separation does not exist in the setting of linear MDPs.
We develop a computationally efficient algorithm for reward-free RL in a $d$-dimensional linear MDP.
arXiv Detail & Related papers (2022-01-26T22:09:59Z) - Settling the Horizon-Dependence of Sample Complexity in Reinforcement
Learning [82.31436758872715]
We develop an algorithm that achieves the same PAC guarantee while using only $O(1)$ episodes of environment interactions.
We establish a connection between value functions in discounted and finite-horizon Markov decision processes.
arXiv Detail & Related papers (2021-11-01T00:21:24Z) - Pessimistic Model-based Offline RL: PAC Bounds and Posterior Sampling
under Partial Coverage [33.766012922307084]
We study model-based offline Reinforcement Learning with general function approximation.
We present an algorithm named Constrained Policy Optimization (CPPO) which leverages a general function class and uses a constraint to encode pessimism.
arXiv Detail & Related papers (2021-07-13T16:30:01Z) - Online Sub-Sampling for Reinforcement Learning with General Function
Approximation [111.01990889581243]
In this paper, we establish an efficient online sub-sampling framework that measures the information gain of data points collected by an RL algorithm.
For a value-based method with complexity-bounded function class, we show that the policy only needs to be updated for $proptooperatornamepolylog(K)$ times.
In contrast to existing approaches that update the policy for at least $Omega(K)$ times, our approach drastically reduces the number of optimization calls in solving for a policy.
arXiv Detail & Related papers (2021-06-14T07:36:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.