Related papers: Measurable Monte Carlo Search Error Bounds

Measurable Monte Carlo Search Error Bounds

URL: http://arxiv.org/abs/2106.04715v1
Date: Tue, 8 Jun 2021 22:20:14 GMT
Title: Measurable Monte Carlo Search Error Bounds
Authors: John Mern, Mykel J. Kochenderfer
Abstract summary: We prove bounds on the sub-optimality of Monte Carlo estimates for non-stationary bandits and Markov decision processes. These bounds can be directly computed at the conclusion of the search and do not require knowledge of the true action-value.
Score: 40.29552672672265
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Monte Carlo planners can often return sub-optimal actions, even if they are guaranteed to converge in the limit of infinite samples. Known asymptotic regret bounds do not provide any way to measure confidence of a recommended action at the conclusion of search. In this work, we prove bounds on the sub-optimality of Monte Carlo estimates for non-stationary bandits and Markov decision processes. These bounds can be directly computed at the conclusion of the search and do not require knowledge of the true action-value. The presented bound holds for general Monte Carlo solvers meeting mild convergence conditions. We empirically test the tightness of the bounds through experiments on a multi-armed bandit and a discrete Markov decision process for both a simple solver and Monte Carlo tree search.

Related papers

Combining Normalizing Flows and Quasi-Monte Carlo [0.0]
Recent advances in machine learning have led to the development of new methods for enhancing Monte Carlo methods. We demonstrate through numerical experiments that this combination can lead to an estimator with significantly lower variance than if the flow was sampled with a classic Monte Carlo.
arXiv Detail & Related papers (2024-01-11T14:17:06Z)
Automatic Rao-Blackwellization for Sequential Monte Carlo with Belief Propagation [4.956977275061968]
Exact Bayesian inference on state-space models(SSM) is in general untractable. We propose a mixed inference algorithm that computes closed-form solutions using belief propagation as much as possible.
arXiv Detail & Related papers (2023-12-15T15:05:25Z)
Faithful Question Answering with Monte-Carlo Planning [78.02429369951363]
We propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps. We formulate the task as a discrete decision-making problem and solve it through the interaction of a reasoning environment and a controller. FAME achieves state-of-the-art performance on the standard benchmark.
arXiv Detail & Related papers (2023-05-04T05:21:36Z)
Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning. CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance. It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z)
Bayesian decision-making under misspecified priors with applications to meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits. We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z)
Annealed Flow Transport Monte Carlo [91.20263039913912]
Annealed Flow Transport (AFT) builds upon Annealed Importance Sampling (AIS) and Sequential Monte Carlo (SMC) AFT relies on NF which is learned sequentially to push particles towards the successive targets. We show that a continuous-time scaling limit of the population version of AFT is given by a Feynman--Kac measure.
arXiv Detail & Related papers (2021-02-15T12:05:56Z)
On the Convergence of Reinforcement Learning with Monte Carlo Exploring Starts [5.137144629366217]
A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring States (MCES) method. We investigate the convergence of this algorithm for the case with undiscounted costs, also known as the shortest path problem. As a side result, we also provide a proof of a version of the supermartingale convergence theorem commonly used in approximation.
arXiv Detail & Related papers (2020-07-21T16:19:09Z)
Free Energy Wells and Overlap Gap Property in Sparse PCA [81.64027805404483]
We study a variant of the sparse PCA (principal component analysis) problem in the "hard" regime. We show bounds on the depth of free energy wells for various Gibbs measures naturally associated to the problem. We prove that the Overlap Gap Property (OGP) holds in a significant part of the hard regime.
arXiv Detail & Related papers (2020-06-18T17:18:02Z)
POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with Non-Asymptotic Analysis [24.373900721120286]
We consider Monte-Carlo planning in an environment with continuous state-action spaces. We introduce Poly-HOOT, an algorithm that augments Monte-Carlo planning with a continuous armed bandit strategy. We investigate for the first time, for the first time, the regret of the enhanced HOO algorithm in non-stationary bandit problems.
arXiv Detail & Related papers (2020-06-08T15:23:19Z)
Connecting the Dots: Numerical Randomized Hamiltonian Monte Carlo with State-Dependent Event Rates [0.0]
We introduce a robust, easy to use and computationally fast alternative to conventional Markov chain Monte Carlo methods for continuous target distributions. The proposed algorithm may yield large speedups and improvements in stability relative to relevant benchmarks. Granted access to a high-quality ODE code, the proposed methodology is both easy to implement and use, even for highly challenging and high-dimensional target distributions.
arXiv Detail & Related papers (2020-05-04T06:23:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.