Planning in Markov Decision Processes with Gap-Dependent Sample
Complexity
- URL: http://arxiv.org/abs/2006.05879v1
- Date: Wed, 10 Jun 2020 15:05:51 GMT
- Title: Planning in Markov Decision Processes with Gap-Dependent Sample
Complexity
- Authors: Anders Jonsson, Emilie Kaufmann, Pierre M\'enard, Omar Darwiche
Domingues, Edouard Leurent, Michal Valko
- Abstract summary: We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process.
We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability.
- Score: 48.98199700043158
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm
for planning in a Markov Decision Process in which transitions have a finite
support. We prove an upper bound on the number of calls to the generative
models needed for MDP-GapE to identify a near-optimal action with high
probability. This problem-dependent sample complexity result is expressed in
terms of the sub-optimality gaps of the state-action pairs that are visited
during exploration. Our experiments reveal that MDP-GapE is also effective in
practice, in contrast with other algorithms with sample complexity guarantees
in the fixed-confidence setting, that are mostly theoretical.
Related papers
- Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes.
This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z) - Towards Instance-Optimality in Online PAC Reinforcement Learning [28.156332484814616]
We propose the first instance-dependent lower bound on the sample complexity required for the PAC identification of a near-optimal policy.
We demonstrate that the sample complexity of the PEDEL algorithm of citeWagenmaker22linearMDP closely approaches this lower bound.
arXiv Detail & Related papers (2023-10-31T19:26:36Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP,
and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making.
We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation.
We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z) - A General Framework for Sample-Efficient Function Approximation in
Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning.
We propose a novel estimation function with decomposable structural properties for optimization-based exploration.
Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z) - Efficient Sampling in POMDPs with Lipschitz Bandits for Motion Planning
in Continuous Spaces [5.732271870257913]
Decision making under uncertainty can be framed as a partially observable Markov decision process (POMDP)
Finding exact solutions of POMDPs is generally intractable, but the solution can be approximated by sampling-based approaches.
We demonstrate the effectiveness of this approach in the context of motion planning for automated driving.
arXiv Detail & Related papers (2021-06-08T09:31:48Z) - Revisiting the Sample Complexity of Sparse Spectrum Approximation of
Gaussian Processes [60.479499225746295]
We introduce a new scalable approximation for Gaussian processes with provable guarantees which hold simultaneously over its entire parameter space.
Our approximation is obtained from an improved sample complexity analysis for sparse spectrum Gaussian processes (SSGPs)
arXiv Detail & Related papers (2020-11-17T05:41:50Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.