Gaussian Process Aggregation for Root-Parallel Monte Carlo Tree Search with Continuous Actions
- URL: http://arxiv.org/abs/2512.09727v1
- Date: Wed, 10 Dec 2025 15:09:06 GMT
- Title: Gaussian Process Aggregation for Root-Parallel Monte Carlo Tree Search with Continuous Actions
- Authors: Junlin Xiao, Victor-Alexandru Darvariu, Bruno Lacerda, Nick Hawes,
- Abstract summary: We introduce a method that uses Gaussian Process Regression to obtain value estimates for promising actions that were not trialed in the environment.<n>We perform a systematic evaluation across 6 different domains, demonstrating that our approach outperforms existing aggregation strategies.
- Score: 17.674265727888063
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Monte Carlo Tree Search is a cornerstone algorithm for online planning, and its root-parallel variant is widely used when wall clock time is limited but best performance is desired. In environments with continuous action spaces, how to best aggregate statistics from different threads is an important yet underexplored question. In this work, we introduce a method that uses Gaussian Process Regression to obtain value estimates for promising actions that were not trialed in the environment. We perform a systematic evaluation across 6 different domains, demonstrating that our approach outperforms existing aggregation strategies while requiring a modest increase in inference time.
Related papers
- Action-Gradient Monte Carlo Tree Search for Non-Parametric Continuous (PO)MDPs [7.170248667518935]
Action-Gradient Monte Carlo Tree Search (AGMCTS) is the first planner to blend non-parametric particle search with online gradient refinement in POMDPs.<n>AGMCTS outperforms widely-used sample-only solvers in solution quality.
arXiv Detail & Related papers (2025-03-15T15:51:06Z) - Truncating Trajectories in Monte Carlo Policy Evaluation: an Adaptive Approach [51.76826149868971]
Policy evaluation via Monte Carlo simulation is at the core of many MC Reinforcement Learning (RL) algorithms.
We propose as a quality index a surrogate of the mean squared error of a return estimator that uses trajectories of different lengths.
We present an adaptive algorithm called Robust and Iterative Data collection strategy Optimization (RIDO)
arXiv Detail & Related papers (2024-10-17T11:47:56Z) - SPO: Sequential Monte Carlo Policy Optimisation [41.52684912140086]
We introduce SPO: Sequential Monte Carlo Policy optimisation.
We show that SPO provides robust policy improvement and efficient scaling properties.
We demonstrate statistically significant improvements in performance relative to model-free and model-based baselines.
arXiv Detail & Related papers (2024-02-12T10:32:47Z) - $K$-Nearest-Neighbor Resampling for Off-Policy Evaluation in Stochastic
Control [0.6906005491572401]
We propose a novel $K$-nearest neighbor reparametric procedure for estimating the performance of a policy from historical data.
Our analysis allows for the sampling of entire episodes, as is common practice in most applications.
Compared to other OPE methods, our algorithm does not require optimization, can be efficiently implemented via tree-based nearest neighbor search and parallelization, and does not explicitly assume a parametric model for the environment's dynamics.
arXiv Detail & Related papers (2023-06-07T23:55:12Z) - Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL)
We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo.
Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z) - Fast and robust Bayesian Inference using Gaussian Processes with GPry [0.0]
We present the GPry algorithm for fast Bayesian inference of general (non-Gaussian) posteriors with a moderate number of parameters.
GPry does not need any pre-training, special hardware such as GPUs, and is intended as a drop-in replacement for traditional Monte Carlo methods for Bayesian inference.
arXiv Detail & Related papers (2022-11-03T17:57:45Z) - Monte Carlo Tree Descent for Black-Box Optimization [10.698553177585973]
We study how to further integrate sample-based descent for faster optimization.
We design novel ways of expanding Monte Carlo search trees, with new descent methods at vertices.
We show empirically that the proposed algorithms can outperform state-of-the-art methods on many challenging benchmark problems.
arXiv Detail & Related papers (2022-11-01T22:45:10Z) - Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning.
CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance.
It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z) - Cluster-and-Conquer: A Framework For Time-Series Forecasting [94.63501563413725]
We propose a three-stage framework for forecasting high-dimensional time-series data.
Our framework is highly general, allowing for any time-series forecasting and clustering method to be used in each step.
When instantiated with simple linear autoregressive models, we are able to achieve state-of-the-art results on several benchmark datasets.
arXiv Detail & Related papers (2021-10-26T20:41:19Z) - Local policy search with Bayesian optimization [73.0364959221845]
Reinforcement learning aims to find an optimal policy by interaction with an environment.
Policy gradients for local search are often obtained from random perturbations.
We develop an algorithm utilizing a probabilistic model of the objective function and its gradient.
arXiv Detail & Related papers (2021-06-22T16:07:02Z) - Bayesian Optimized Monte Carlo Planning [34.8909579244631]
Monte Carlo tree search with progressive widening attempts to improve scaling by sampling from the action space to construct a policy search tree.
We present a general method for efficient action sampling based on Bayesian optimization.
We implement the proposed approach in a new online tree search algorithm called Bayesian Optimized Monte Carlo Planning.
arXiv Detail & Related papers (2020-10-07T18:29:27Z) - Time-varying Gaussian Process Bandit Optimization with Non-constant
Evaluation Time [93.6788993843846]
We propose a novel time-varying Bayesian optimization algorithm that can effectively handle the non-constant evaluation time.
Our bound elucidates that a pattern of the evaluation time sequence can hugely affect the difficulty of the problem.
arXiv Detail & Related papers (2020-03-10T13:28:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.