Bayesian Optimized Monte Carlo Planning
- URL: http://arxiv.org/abs/2010.03597v1
- Date: Wed, 7 Oct 2020 18:29:27 GMT
- Title: Bayesian Optimized Monte Carlo Planning
- Authors: John Mern, Anil Yildiz, Zachary Sunberg, Tapan Mukerji, Mykel J.
Kochenderfer
- Abstract summary: Monte Carlo tree search with progressive widening attempts to improve scaling by sampling from the action space to construct a policy search tree.
We present a general method for efficient action sampling based on Bayesian optimization.
We implement the proposed approach in a new online tree search algorithm called Bayesian Optimized Monte Carlo Planning.
- Score: 34.8909579244631
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Online solvers for partially observable Markov decision processes have
difficulty scaling to problems with large action spaces. Monte Carlo tree
search with progressive widening attempts to improve scaling by sampling from
the action space to construct a policy search tree. The performance of
progressive widening search is dependent upon the action sampling policy, often
requiring problem-specific samplers. In this work, we present a general method
for efficient action sampling based on Bayesian optimization. The proposed
method uses a Gaussian process to model a belief over the action-value function
and selects the action that will maximize the expected improvement in the
optimal action value. We implement the proposed approach in a new online tree
search algorithm called Bayesian Optimized Monte Carlo Planning (BOMCP).
Several experiments show that BOMCP is better able to scale to large action
space POMDPs than existing state-of-the-art tree search solvers.
Related papers
- Poisson Process for Bayesian Optimization [126.51200593377739]
We propose a ranking-based surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO)
Compared to the classic GP-BO method, our PoPBO has lower costs and better robustness to noise, which is verified by abundant experiments.
arXiv Detail & Related papers (2024-02-05T02:54:50Z) - Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL)
We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo.
Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z) - Adaptive Discretization using Voronoi Trees for Continuous POMDPs [7.713622698801596]
We propose a new sampling-based online POMDP solver, called Adaptive Discretization using Voronoi Trees (ADVT)
It uses Monte Carlo Tree Search in combination with an adaptive discretization of the action space as well as optimistic optimization to efficiently sample high-dimensional continuous action spaces.
ADVT scales substantially better to high-dimensional continuous action spaces, compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-02-21T04:47:34Z) - Monte Carlo Tree Descent for Black-Box Optimization [10.698553177585973]
We study how to further integrate sample-based descent for faster optimization.
We design novel ways of expanding Monte Carlo search trees, with new descent methods at vertices.
We show empirically that the proposed algorithms can outperform state-of-the-art methods on many challenging benchmark problems.
arXiv Detail & Related papers (2022-11-01T22:45:10Z) - Efficient Non-Parametric Optimizer Search for Diverse Tasks [93.64739408827604]
We present the first efficient scalable and general framework that can directly search on the tasks of interest.
Inspired by the innate tree structure of the underlying math expressions, we re-arrange the spaces into a super-tree.
We adopt an adaptation of the Monte Carlo method to tree search, equipped with rejection sampling and equivalent- form detection.
arXiv Detail & Related papers (2022-09-27T17:51:31Z) - Adaptive Discretization using Voronoi Trees for Continuous-Action POMDPs [7.713622698801596]
We propose a new sampling-based online POMDP solver, called Adaptive Discretization using Voronoi Trees (ADVT)
ADVT uses Monte Carlo Tree Search in combination with an adaptive discretization of the action space as well as optimistic optimization.
Experiments on simulations of four types of benchmark problems indicate that ADVT outperforms and scales substantially better to high-dimensional continuous action spaces.
arXiv Detail & Related papers (2022-09-13T05:04:49Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Improved POMDP Tree Search Planning with Prioritized Action Branching [33.94599291823342]
This paper proposes a method called PA-POMCPOW to sample a subset of the action space that provides varying mixtures of exploitation and exploration for inclusion in a search tree.
Experiments show that PA-POMCPOW is able to outperform existing state-of-the-art solvers on problems with large discrete action spaces.
arXiv Detail & Related papers (2020-10-07T18:33:57Z) - Planning in Markov Decision Processes with Gap-Dependent Sample
Complexity [48.98199700043158]
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process.
We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability.
arXiv Detail & Related papers (2020-06-10T15:05:51Z) - Model-Predictive Control via Cross-Entropy and Gradient-Based
Optimization [26.497575737219794]
Cross-Entropy Method (CEM) is a population-based optimization method for planning a sequence of actions.
We propose a method to solve this problem by interleaving CEM and gradient descent steps in optimizing the action sequence.
Our experiments show faster convergence of the proposed hybrid approach, even for high-dimensional action spaces.
arXiv Detail & Related papers (2020-04-19T03:54:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.