Measurable Monte Carlo Search Error Bounds
- URL: http://arxiv.org/abs/2106.04715v1
- Date: Tue, 8 Jun 2021 22:20:14 GMT
- Title: Measurable Monte Carlo Search Error Bounds
- Authors: John Mern, Mykel J. Kochenderfer
- Abstract summary: We prove bounds on the sub-optimality of Monte Carlo estimates for non-stationary bandits and Markov decision processes.
These bounds can be directly computed at the conclusion of the search and do not require knowledge of the true action-value.
- Score: 40.29552672672265
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monte Carlo planners can often return sub-optimal actions, even if they are
guaranteed to converge in the limit of infinite samples. Known asymptotic
regret bounds do not provide any way to measure confidence of a recommended
action at the conclusion of search. In this work, we prove bounds on the
sub-optimality of Monte Carlo estimates for non-stationary bandits and Markov
decision processes. These bounds can be directly computed at the conclusion of
the search and do not require knowledge of the true action-value. The presented
bound holds for general Monte Carlo solvers meeting mild convergence
conditions. We empirically test the tightness of the bounds through experiments
on a multi-armed bandit and a discrete Markov decision process for both a
simple solver and Monte Carlo tree search.
Related papers
- Combining Normalizing Flows and Quasi-Monte Carlo [0.0]
Recent advances in machine learning have led to the development of new methods for enhancing Monte Carlo methods.
We demonstrate through numerical experiments that this combination can lead to an estimator with significantly lower variance than if the flow was sampled with a classic Monte Carlo.
arXiv Detail & Related papers (2024-01-11T14:17:06Z) - Automatic Rao-Blackwellization for Sequential Monte Carlo with Belief
Propagation [4.956977275061968]
Exact Bayesian inference on state-space models(SSM) is in general untractable.
We propose a mixed inference algorithm that computes closed-form solutions using belief propagation as much as possible.
arXiv Detail & Related papers (2023-12-15T15:05:25Z) - Faithful Question Answering with Monte-Carlo Planning [78.02429369951363]
We propose FAME (FAithful question answering with MontE-carlo planning) to answer questions based on faithful reasoning steps.
We formulate the task as a discrete decision-making problem and solve it through the interaction of a reasoning environment and a controller.
FAME achieves state-of-the-art performance on the standard benchmark.
arXiv Detail & Related papers (2023-05-04T05:21:36Z) - Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning.
CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance.
It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z) - Bayesian decision-making under misspecified priors with applications to
meta-learning [64.38020203019013]
Thompson sampling and other sequential decision-making algorithms are popular approaches to tackle explore/exploit trade-offs in contextual bandits.
We show that performance degrades gracefully with misspecified priors.
arXiv Detail & Related papers (2021-07-03T23:17:26Z) - Annealed Flow Transport Monte Carlo [91.20263039913912]
Annealed Flow Transport (AFT) builds upon Annealed Importance Sampling (AIS) and Sequential Monte Carlo (SMC)
AFT relies on NF which is learned sequentially to push particles towards the successive targets.
We show that a continuous-time scaling limit of the population version of AFT is given by a Feynman--Kac measure.
arXiv Detail & Related papers (2021-02-15T12:05:56Z) - On the Convergence of Reinforcement Learning with Monte Carlo Exploring
Starts [5.137144629366217]
A basic simulation-based reinforcement learning algorithm is the Monte Carlo Exploring States (MCES) method.
We investigate the convergence of this algorithm for the case with undiscounted costs, also known as the shortest path problem.
As a side result, we also provide a proof of a version of the supermartingale convergence theorem commonly used in approximation.
arXiv Detail & Related papers (2020-07-21T16:19:09Z) - Free Energy Wells and Overlap Gap Property in Sparse PCA [81.64027805404483]
We study a variant of the sparse PCA (principal component analysis) problem in the "hard" regime.
We show bounds on the depth of free energy wells for various Gibbs measures naturally associated to the problem.
We prove that the Overlap Gap Property (OGP) holds in a significant part of the hard regime.
arXiv Detail & Related papers (2020-06-18T17:18:02Z) - POLY-HOOT: Monte-Carlo Planning in Continuous Space MDPs with
Non-Asymptotic Analysis [24.373900721120286]
We consider Monte-Carlo planning in an environment with continuous state-action spaces.
We introduce Poly-HOOT, an algorithm that augments Monte-Carlo planning with a continuous armed bandit strategy.
We investigate for the first time, for the first time, the regret of the enhanced HOO algorithm in non-stationary bandit problems.
arXiv Detail & Related papers (2020-06-08T15:23:19Z) - Connecting the Dots: Numerical Randomized Hamiltonian Monte Carlo with
State-Dependent Event Rates [0.0]
We introduce a robust, easy to use and computationally fast alternative to conventional Markov chain Monte Carlo methods for continuous target distributions.
The proposed algorithm may yield large speedups and improvements in stability relative to relevant benchmarks.
Granted access to a high-quality ODE code, the proposed methodology is both easy to implement and use, even for highly challenging and high-dimensional target distributions.
arXiv Detail & Related papers (2020-05-04T06:23:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.