Efficient Planning in Large MDPs with Weak Linear Function Approximation
- URL: http://arxiv.org/abs/2007.06184v1
- Date: Mon, 13 Jul 2020 04:40:41 GMT
- Title: Efficient Planning in Large MDPs with Weak Linear Function Approximation
- Authors: Roshan Shariff and Csaba Szepesv\'ari
- Abstract summary: Large-scale decision processes (MDPs) require planning algorithms independent of the number of states of the MDP.
We consider the planning problem in MDPs using linear value function approximation with only weak requirements.
- Score: 4.56877715768796
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Large-scale Markov decision processes (MDPs) require planning algorithms with
runtime independent of the number of states of the MDP. We consider the
planning problem in MDPs using linear value function approximation with only
weak requirements: low approximation error for the optimal value function, and
a small set of "core" states whose features span those of other states. In
particular, we make no assumptions about the representability of policies or
value functions of non-optimal policies. Our algorithm produces almost-optimal
actions for any state using a generative oracle (simulator) for the MDP, while
its computation time scales polynomially with the number of features, core
states, and actions and the effective horizon.
Related papers
- Sample and Oracle Efficient Reinforcement Learning for MDPs with Linearly-Realizable Value Functions [10.225358400539719]
We present an efficient reinforcement algorithm for Decision (MDPs) where a linear-action-action generalizes in a feature map.
Specifically, we introduce a new algorithm that efficiently finds a near-optimal policy in this setting.
arXiv Detail & Related papers (2024-09-07T14:38:05Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Non-stationary Reinforcement Learning under General Function
Approximation [60.430936031067006]
We first propose a new complexity metric called dynamic Bellman Eluder (DBE) dimension for non-stationary MDPs.
Based on the proposed complexity metric, we propose a novel confidence-set based model-free algorithm called SW-OPEA.
We show that SW-OPEA is provably efficient as long as the variation budget is not significantly large.
arXiv Detail & Related papers (2023-06-01T16:19:37Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Markov Decision
Processes [80.89852729380425]
We propose the first computationally efficient algorithm that achieves the nearly minimax optimal regret $tilde O(dsqrtH3K)$.
Our work provides a complete answer to optimal RL with linear MDPs, and the developed algorithm and theoretical tools may be of independent interest.
arXiv Detail & Related papers (2022-12-12T18:58:59Z) - Efficient Global Planning in Large MDPs via Stochastic Primal-Dual
Optimization [12.411844611718958]
We show that our method outputs a near-optimal policy after a number of queries to the generative model.
Our method is computationally efficient and comes with the major advantage that it outputs a single softmax policy that is compactly represented by a low-dimensional parameter vector.
arXiv Detail & Related papers (2022-10-21T15:49:20Z) - Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states.
We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy.
We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z) - A Fully Polynomial Time Approximation Scheme for Constrained MDPs and
Stochastic Shortest Path under Local Transitions [2.512827436728378]
We study the structure of (C)C-MDP, particularly an important variant that involves local transition.
In this work, we propose a fully-time approximation scheme for (C)C-MDP that computes (near) optimal deterministic policies.
arXiv Detail & Related papers (2022-04-10T22:08:33Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - On Query-efficient Planning in MDPs under Linear Realizability of the
Optimal State-value Function [14.205660708980988]
We consider the problem of local planning in fixed-horizon Markov Decision Processes (MDPs) with a generative model.
A recent lower bound established that the related problem when the action-value function of the optimal policy is linearly realizable requires an exponential number of queries.
In this work, we establish that poly$(H, d)$ learning is possible (with state value function realizability) whenever the action set is small.
arXiv Detail & Related papers (2021-02-03T13:23:15Z) - Minimax Regret Optimisation for Robust Planning in Uncertain Markov
Decision Processes [3.5289688061934963]
Minimax regret has been proposed as an objective for planning in Uncertain MDPs to find robust policies.
We introduce a Bellman equation to compute the regret for a policy.
We show that it optimises minimax regret exactly for UMDPs with independent uncertainties.
arXiv Detail & Related papers (2020-12-08T18:48:14Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.