Computational-Statistical Gaps in Reinforcement Learning
- URL: http://arxiv.org/abs/2202.05444v1
- Date: Fri, 11 Feb 2022 04:48:35 GMT
- Title: Computational-Statistical Gaps in Reinforcement Learning
- Authors: Daniel Kane, Sihan Liu, Shachar Lovett, Gaurav Mahajan
- Abstract summary: We show a reduction from Unique-Sat, where we convert a CNF formula into an MDP Hypothesis with deterministic transitions, constant number of actions and low dimensional linear optimal value functions.
This result also exhibits the first computational-statistical gap in reinforcement learning with linear function approximation.
- Score: 23.517741855454044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reinforcement learning with function approximation has recently achieved
tremendous results in applications with large state spaces. This empirical
success has motivated a growing body of theoretical work proposing necessary
and sufficient conditions under which efficient reinforcement learning is
possible. From this line of work, a remarkably simple minimal sufficient
condition has emerged for sample efficient reinforcement learning: MDPs with
optimal value function $V^*$ and $Q^*$ linear in some known low-dimensional
features. In this setting, recent works have designed sample efficient
algorithms which require a number of samples polynomial in the feature
dimension and independent of the size of state space. They however leave
finding computationally efficient algorithms as future work and this is
considered a major open problem in the community.
In this work, we make progress on this open problem by presenting the first
computational lower bound for RL with linear function approximation: unless
NP=RP, no randomized polynomial time algorithm exists for deterministic
transition MDPs with a constant number of actions and linear optimal value
functions. To prove this, we show a reduction from Unique-Sat, where we convert
a CNF formula into an MDP with deterministic transitions, constant number of
actions and low dimensional linear optimal value functions. This result also
exhibits the first computational-statistical gap in reinforcement learning with
linear function approximation, as the underlying statistical problem is
information-theoretically solvable with a polynomial number of queries, but no
computationally efficient algorithm exists unless NP=RP. Finally, we also prove
a quasi-polynomial time lower bound under the Randomized Exponential Time
Hypothesis.
Related papers
- Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions [10.225358400539719]
We present an efficient reinforcement algorithm for Decision (MDPs) where a linear-action-action generalizes in a feature map.
Specifically, we introduce a new algorithm that efficiently finds a near-optimal policy in this setting.
arXiv Detail & Related papers (2024-09-07T14:38:05Z) - The Power of Resets in Online Reinforcement Learning [73.64852266145387]
We explore the power of simulators through online reinforcement learning with local simulator access (or, local planning)
We show that MDPs with low coverability can be learned in a sample-efficient fashion with only $Qstar$-realizability.
We show that the notorious Exogenous Block MDP problem is tractable under local simulator access.
arXiv Detail & Related papers (2024-04-23T18:09:53Z) - Efficient Model-Free Exploration in Low-Rank MDPs [76.87340323826945]
Low-Rank Markov Decision Processes offer a simple, yet expressive framework for RL with function approximation.
Existing algorithms are either (1) computationally intractable, or (2) reliant upon restrictive statistical assumptions.
We propose the first provably sample-efficient algorithm for exploration in Low-Rank MDPs.
arXiv Detail & Related papers (2023-07-08T15:41:48Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Estimating a potential without the agony of the partition function [5.994412766684842]
Estimating a Gibbs density function given a sample is an important problem in computational statistics and statistical learning.
We propose an alternative approach based on Maximum A-Posteriori (MAP) estimators.
arXiv Detail & Related papers (2022-08-19T16:27:02Z) - On efficient algorithms for computing near-best polynomial
approximations to high-dimensional, Hilbert-valued functions from limited
samples [1.0650780147044159]
Sparse approximation has become indispensable for approxing smooth, high-dimensional functions from limited samples.
This paper introduces algorithms and theoretical guarantees that assert exponential or algebraic rates of convergence, along with robustness to sampling, algorithmic and physical discretization errors.
arXiv Detail & Related papers (2022-03-25T20:56:07Z) - Online Model Selection for Reinforcement Learning with Function
Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret.
We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z) - On Function Approximation in Reinforcement Learning: Optimism in the
Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning.
In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function.
Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z) - Reinforcement Learning with General Value Function Approximation:
Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation.
We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure.
Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.