Related papers: Exponential Hardness of Reinforcement Learning with Linear Function Approximation

Exponential Hardness of Reinforcement Learning with Linear Function Approximation

URL: http://arxiv.org/abs/2302.12940v1
Date: Sat, 25 Feb 2023 00:19:49 GMT
Title: Exponential Hardness of Reinforcement Learning with Linear Function Approximation
Authors: Daniel Kane, Sihan Liu, Shachar Lovett, Gaurav Mahajan, Csaba Szepesv\'ari, Gell\'ert Weisz
Abstract summary: We show a computational lower bound, which is exponential in feature and horizon, for linear reinforcement learning under the Exponential Time Hypothesis. We also show a lower bound optimized for horizon dependence that almost matches the best known upper bound of $exp(sqrtH)$.
Score: 20.066210529358177
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A fundamental question in reinforcement learning theory is: suppose the optimal value functions are linear in given features, can we learn them efficiently? This problem's counterpart in supervised learning, linear regression, can be solved both statistically and computationally efficiently. Therefore, it was quite surprising when a recent work \cite{kane2022computational} showed a computational-statistical gap for linear reinforcement learning: even though there are polynomial sample-complexity algorithms, unless NP = RP, there are no polynomial time algorithms for this setting. In this work, we build on their result to show a computational lower bound, which is exponential in feature dimension and horizon, for linear reinforcement learning under the Randomized Exponential Time Hypothesis. To prove this we build a round-based game where in each round the learner is searching for an unknown vector in a unit hypercube. The rewards in this game are chosen such that if the learner achieves large reward, then the learner's actions can be used to simulate solving a variant of 3-SAT, where (a) each variable shows up in a bounded number of clauses (b) if an instance has no solutions then it also has no solutions that satisfy more than (1-$\epsilon$)-fraction of clauses. We use standard reductions to show this 3-SAT variant is approximately as hard as 3-SAT. Finally, we also show a lower bound optimized for horizon dependence that almost matches the best known upper bound of $\exp(\sqrt{H})$.

Related papers

Polynomial-Time Solutions for ReLU Network Training: A Complexity Classification via Max-Cut and Zonotopes [70.52097560486683]
We prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it. In particular, when $epsilonleqsqrt84/83-1approx 0.006$, we show that it is NP-hard to find an approximate global dataset of the ReLU network objective with relative error $epsilon$ with respect to the objective value.
arXiv Detail & Related papers (2023-11-18T04:41:07Z)
Exploring and Learning in Sparse Linear MDPs without Computationally Intractable Oracles [39.10180309328293]
In this paper we revisit linear MDPs from the perspective of feature selection. Our main result is the first-time algorithm for this problem. We show that they do exist and can be computed efficiently via convex programming.
arXiv Detail & Related papers (2023-09-18T03:35:48Z)
Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials [50.90125395570797]
We study the problem of PAC learning a linear combination of $k$ ReLU activations under the standard Gaussian distribution on $mathbbRd$ with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity $(dk/epsilon)O(k)$, whereepsilon>0$ is the target accuracy.
arXiv Detail & Related papers (2023-07-24T14:37:22Z)
Computational-Statistical Gaps in Reinforcement Learning [23.517741855454044]
We show a reduction from Unique-Sat, where we convert a CNF formula into an MDP Hypothesis with deterministic transitions, constant number of actions and low dimensional linear optimal value functions. This result also exhibits the first computational-statistical gap in reinforcement learning with linear function approximation.
arXiv Detail & Related papers (2022-02-11T04:48:35Z)
Optimal Gradient-based Algorithms for Non-concave Bandit Optimization [76.57464214864756]
This work considers a large family of bandit problems where the unknown underlying reward function is non-concave. Our algorithms are based on a unified zeroth-order optimization paradigm that applies in great generality. We show that the standard optimistic algorithms are sub-optimal by dimension factors.
arXiv Detail & Related papers (2021-07-09T16:04:24Z)
Randomized Exploration for Reinforcement Learning with General Value Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm. Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises. We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z)
Learning from Survey Propagation: a Neural Network for MAX-E-$3$-SAT [0.0]
This paper presents a new algorithm for computing approximate solutions in $Theta(N)$ for the Maximum Exact 3-Satisfiability (MAX-E-$3$-SAT) problem.
arXiv Detail & Related papers (2020-12-10T07:59:54Z)
Online Model Selection for Reinforcement Learning with Function Approximation [50.008542459050155]
We present a meta-algorithm that adapts to the optimal complexity with $tildeO(L5/6 T2/3)$ regret. We also show that the meta-algorithm automatically admits significantly improved instance-dependent regret bounds.
arXiv Detail & Related papers (2020-11-19T10:00:54Z)
On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces [208.67848059021915]
We study the exploration-exploitation tradeoff at the core of reinforcement learning. In particular, we prove that the complexity of the function class $mathcalF$ characterizes the complexity of the function. Our regret bounds are independent of the number of episodes.
arXiv Detail & Related papers (2020-11-09T18:32:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.