Related papers: Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

Planning in Markov Decision Processes with Gap-Dependent Sample Complexity

URL: http://arxiv.org/abs/2006.05879v1
Date: Wed, 10 Jun 2020 15:05:51 GMT
Title: Planning in Markov Decision Processes with Gap-Dependent Sample Complexity
Authors: Anders Jonsson, Emilie Kaufmann, Pierre M\'enard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko
Abstract summary: We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability.
Score: 48.98199700043158
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process in which transitions have a finite support. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability. This problem-dependent sample complexity result is expressed in terms of the sub-optimality gaps of the state-action pairs that are visited during exploration. Our experiments reveal that MDP-GapE is also effective in practice, in contrast with other algorithms with sample complexity guarantees in the fixed-confidence setting, that are mostly theoretical.

Related papers

Efficient Learning for Entropy-Regularized Markov Decision Processes via Multilevel Monte Carlo [3.439970905480239]
We propose a novel family of multilevel Monte Carlo (MLMC) algorithms that integrate fixed-point iteration with a generic approximation of the Bellman operator. We show that using a biased plain MC estimate for the Bellman operator results in quasi-polynomial sample complexity. Notably, these algorithms are independent of the dimensions or cardinalities of the state and action spaces.
arXiv Detail & Related papers (2025-03-27T07:35:23Z)
Q-learning for Quantile MDPs: A Decomposition, Performance, and Convergence Analysis [30.713243690224207]
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees.
arXiv Detail & Related papers (2024-10-31T16:53:20Z)
Towards Instance-Optimality in Online PAC Reinforcement Learning [28.156332484814616]
We propose the first instance-dependent lower bound on the sample complexity required for the PAC identification of a near-optimal policy. We demonstrate that the sample complexity of the PEDEL algorithm of citeWagenmaker22linearMDP closely approaches this lower bound.
arXiv Detail & Related papers (2023-10-31T19:26:36Z)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs) This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z)
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond [101.5329678997916]
We study sample efficient reinforcement learning (RL) under the general framework of interactive decision making. We propose a novel complexity measure, generalized eluder coefficient (GEC), which characterizes the fundamental tradeoff between exploration and exploitation. We show that RL problems with low GEC form a remarkably rich class, which subsumes low Bellman eluder dimension problems, bilinear class, low witness rank problems, PO-bilinear class, and generalized regular PSR.
arXiv Detail & Related papers (2022-11-03T16:42:40Z)
A General Framework for Sample-Efficient Function Approximation in Reinforcement Learning [132.45959478064736]
We propose a general framework that unifies model-based and model-free reinforcement learning. We propose a novel estimation function with decomposable structural properties for optimization-based exploration. Under our framework, a new sample-efficient algorithm namely OPtimization-based ExploRation with Approximation (OPERA) is proposed.
arXiv Detail & Related papers (2022-09-30T17:59:16Z)
Efficient Sampling in POMDPs with Lipschitz Bandits for Motion Planning in Continuous Spaces [5.732271870257913]
Decision making under uncertainty can be framed as a partially observable Markov decision process (POMDP) Finding exact solutions of POMDPs is generally intractable, but the solution can be approximated by sampling-based approaches. We demonstrate the effectiveness of this approach in the context of motion planning for automated driving.
arXiv Detail & Related papers (2021-06-08T09:31:48Z)
Revisiting the Sample Complexity of Sparse Spectrum Approximation of Gaussian Processes [60.479499225746295]
We introduce a new scalable approximation for Gaussian processes with provable guarantees which hold simultaneously over its entire parameter space. Our approximation is obtained from an improved sample complexity analysis for sparse spectrum Gaussian processes (SSGPs)
arXiv Detail & Related papers (2020-11-17T05:41:50Z)
Adaptive Sampling for Best Policy Identification in Markov Decision Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model. The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.