TensorPlan and the Few Actions Lower Bound for Planning in MDPs under
Linear Realizability of Optimal Value Functions
- URL: http://arxiv.org/abs/2110.02195v1
- Date: Tue, 5 Oct 2021 17:42:46 GMT
- Title: TensorPlan and the Few Actions Lower Bound for Planning in MDPs under
Linear Realizability of Optimal Value Functions
- Authors: Gell\'ert Weisz, Csaba Szepesv\'ari, Andr\'as Gy\"orgy
- Abstract summary: We consider the minimax query complexity of online planning with a generative model in fixed-horizon decision processes.
We show that an exponentially large lower bound holds when $A=Omega(min(d1/4,H1/2)).
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the minimax query complexity of online planning with a generative
model in fixed-horizon Markov decision processes (MDPs) with linear function
approximation. Following recent works, we consider broad classes of problems
where either (i) the optimal value function $v^\star$ or (ii) the optimal
action-value function $q^\star$ lie in the linear span of some features; or
(iii) both $v^\star$ and $q^\star$ lie in the linear span when restricted to
the states reachable from the starting state. Recently, Weisz et al. (2021b)
showed that under (ii) the minimax query complexity of any planning algorithm
is at least exponential in the horizon $H$ or in the feature dimension $d$ when
the size $A$ of the action set can be chosen to be exponential in $\min(d,H)$.
On the other hand, for the setting (i), Weisz et al. (2021a) introduced
TensorPlan, a planner whose query cost is polynomial in all relevant quantities
when the number of actions is fixed. Among other things, these two works left
open the question whether polynomial query complexity is possible when $A$ is
subexponential in $min(d,H)$. In this paper we answer this question in the
negative: we show that an exponentially large lower bound holds when
$A=\Omega(\min(d^{1/4},H^{1/2}))$, under either (i), (ii) or (iii). In
particular, this implies a perhaps surprising exponential separation of query
complexity compared to the work of Du et al. (2021) who prove a polynomial
upper bound when (iii) holds for all states. Furthermore, we show that the
upper bound of TensorPlan can be extended to hold under (iii) and, for MDPs
with deterministic transitions and stochastic rewards, also under (ii).
Related papers
- Demystifying Linear MDPs and Novel Dynamics Aggregation Framework [8.087699764574788]
In linear MDPs, the feature dimension $d$ is lower bounded by $S/U$ in order to aptly represent transition probabilities.
We propose a novel structural aggregation framework based on dynamics, named as the "dynamics aggregation"
Our proposed algorithm exhibits statistical efficiency, achieving a regret of $ tildeO ( d_psi3/2 H3/2sqrt T)$, where $d_psi$ represents the feature dimension of aggregated subMDPs.
arXiv Detail & Related papers (2024-10-31T16:21:41Z) - Provably Efficient Reinforcement Learning via Surprise Bound [66.15308700413814]
We propose a provably efficient reinforcement learning algorithm (both computationally and statistically) with general value function approximations.
Our algorithm achieves reasonable regret bounds when applied to both the linear setting and the sparse high-dimensional linear setting.
arXiv Detail & Related papers (2023-02-22T20:21:25Z) - Provably Breaking the Quadratic Error Compounding Barrier in Imitation
Learning, Optimally [58.463668865380946]
We study the statistical limits of Imitation Learning in episodic Markov Decision Processes (MDPs) with a state space $mathcalS$.
We establish an upper bound $O(|mathcalS|H3/2/N)$ for the suboptimality using the Mimic-MD algorithm in Rajaraman et al ( 2020)
We show the minimax suboptimality grows as $Omega( H3/2/N)$ when $mathcalS|geq 3$ while the unknown-transition setting suffers from a larger sharp rate
arXiv Detail & Related papers (2021-02-25T15:50:19Z) - Nearly Optimal Regret for Learning Adversarial MDPs with Linear Function
Approximation [92.3161051419884]
We study the reinforcement learning for finite-horizon episodic Markov decision processes with adversarial reward and full information feedback.
We show that it can achieve $tildeO(dHsqrtT)$ regret, where $H$ is the length of the episode.
We also prove a matching lower bound of $tildeOmega(dHsqrtT)$ up to logarithmic factors.
arXiv Detail & Related papers (2021-02-17T18:54:08Z) - On Query-efficient Planning in MDPs under Linear Realizability of the
Optimal State-value Function [14.205660708980988]
We consider the problem of local planning in fixed-horizon Markov Decision Processes (MDPs) with a generative model.
A recent lower bound established that the related problem when the action-value function of the optimal policy is linearly realizable requires an exponential number of queries.
In this work, we establish that poly$(H, d)$ learning is possible (with state value function realizability) whenever the action set is small.
arXiv Detail & Related papers (2021-02-03T13:23:15Z) - Finding Global Minima via Kernel Approximations [90.42048080064849]
We consider the global minimization of smooth functions based solely on function evaluations.
In this paper, we consider an approach that jointly models the function to approximate and finds a global minimum.
arXiv Detail & Related papers (2020-12-22T12:59:30Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov
Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model.
We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs.
To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z) - Logarithmic Regret for Reinforcement Learning with Linear Function
Approximation [99.59319332864129]
We show that logarithmic regret is attainable under two recently proposed linear MDP assumptions.
To the best of our knowledge, these are the first logarithmic regret bounds for RL with linear function approximation.
arXiv Detail & Related papers (2020-11-23T17:25:00Z) - Exponential Lower Bounds for Planning in MDPs With Linearly-Realizable
Optimal Action-Value Functions [3.2980012891975754]
We consider the problem of local planning in fixed-horizon and discounted Markov Decision Processes.
We show that any sound planner must query at least $min(exp(Omega(d)), Omega (2H))$ samples in the fized-horizon setting and $exp(Omega(d))$ samples in the discounted setting.
arXiv Detail & Related papers (2020-10-03T15:19:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.