Related papers: Monte-Carlo tree search with uncertainty propagation via optimal transport

Monte-Carlo tree search with uncertainty propagation via optimal transport

URL: http://arxiv.org/abs/2309.10737v1
Date: Tue, 19 Sep 2023 16:32:04 GMT
Title: Monte-Carlo tree search with uncertainty propagation via optimal transport
Authors: Tuan Dam, Pascal Stenger, Lukas Schneider, Joni Pajarinen, Carlo D'Eramo, Odalric-Ambrym Maillard
Abstract summary: This paper introduces a novel backup strategy for Monte-Carlo Tree Search (MCTS) We adopt a probabilistic approach, modeling both value and action-value nodes as Gaussian distributions. We provide theoretical guarantees of convergence to the optimal policy, and an empirical evaluation on several and partially observable environments.
Score: 27.931569422467835
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper introduces a novel backup strategy for Monte-Carlo Tree Search (MCTS) designed for highly stochastic and partially observable Markov decision processes. We adopt a probabilistic approach, modeling both value and action-value nodes as Gaussian distributions. We introduce a novel backup operator that computes value nodes as the Wasserstein barycenter of their action-value children nodes; thus, propagating the uncertainty of the estimate across the tree to the root node. We study our novel backup operator when using a novel combination of $L^1$-Wasserstein barycenter with $\alpha$-divergence, by drawing a notable connection to the generalized mean backup operator. We complement our probabilistic backup operator with two sampling strategies, based on optimistic selection and Thompson sampling, obtaining our Wasserstein MCTS algorithm. We provide theoretical guarantees of asymptotic convergence to the optimal policy, and an empirical evaluation on several stochastic and partially observable environments, where our approach outperforms well-known related baselines.

Related papers

BARK: A Fully Bayesian Tree Kernel for Black-box Optimization [5.547538839664342]
We perform Bayesian optimization using a Gaussian process perspective on Bayesian Additive Regression Trees (BART) Our BART Kernel (BARK) uses tree agreement to define a posterior over piecewise-constant functions, and we explore the space of tree kernels using a Markov chain Monte Carlo approach. Our experiments show the strong performance of BARK on both synthetic and applied benchmarks.
arXiv Detail & Related papers (2025-03-07T16:56:09Z)
Beyond Log-Concavity and Score Regularity: Improved Convergence Bounds for Score-Based Generative Models in W2-distance [0.0]
We present a novel framework for analyzing $mathcalW$-convergence in Generative Models (SGMs)<n>We show that weak log-concavity of the data distribution evolves into log-concavity over time.<n>Our approach circumvents the need for stringent regularity conditions on the score function and its estimators.
arXiv Detail & Related papers (2025-01-04T14:33:27Z)
Generalized Bayesian deep reinforcement learning [2.469908534801392]
We propose to model the dynamics of the unknown environment through deep generative models assuming Markov dependence. In absence of likelihood functions for these models we train them by learning a generalized predictive-sequential (or prequential) scoring rule (SR) posterior. For policy learning, we propose expected Thompson sampling (ETS) to learn the optimal policy by maximizing the expected value function with respect to the posterior distribution.
arXiv Detail & Related papers (2024-12-16T13:02:17Z)
Power Mean Estimation in Stochastic Monte-Carlo Tree_Search [25.058008522872747]
Monte-Carlo Tree Search (MCTS) is a widely-used strategy for online planning that combines Monte-Carlo sampling with forward tree search. The theoretical foundation of UCT is incomplete due to an error in the logarithmic bonus term for action selection. This paper introduces an algorithm using the power mean estimator and tailored for MDPs.
arXiv Detail & Related papers (2024-06-04T11:56:37Z)
Poisson Process for Bayesian Optimization [126.51200593377739]
We propose a ranking-based surrogate model based on the Poisson process and introduce an efficient BO framework, namely Poisson Process Bayesian Optimization (PoPBO) Compared to the classic GP-BO method, our PoPBO has lower costs and better robustness to noise, which is verified by abundant experiments.
arXiv Detail & Related papers (2024-02-05T02:54:50Z)
Provable and Practical: Efficient Exploration in Reinforcement Learning via Langevin Monte Carlo [104.9535542833054]
We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL) We instead directly sample the Q function from its posterior distribution, by using Langevin Monte Carlo. Our approach achieves better or similar results compared with state-of-the-art deep RL algorithms on several challenging exploration tasks from the Atari57 suite.
arXiv Detail & Related papers (2023-05-29T17:11:28Z)
Double Pessimism is Provably Efficient for Distributionally Robust Offline Reinforcement Learning: Generic Algorithm and Robust Partial Coverage [15.858892479232656]
We study robust offline reinforcement learning (robust offline RL) We propose a generic algorithm framework called Doubly Pessimistic Model-based Policy Optimization ($P2MPO$) We show that $P2MPO$ enjoys a $tildemathcalO(n-1/2)$ convergence rate, where $n$ is the dataset size.
arXiv Detail & Related papers (2023-05-16T17:58:05Z)
Langevin Monte Carlo for Contextual Bandits [72.00524614312002]
Langevin Monte Carlo Thompson Sampling (LMC-TS) is proposed to directly sample from the posterior distribution in contextual bandits. We prove that the proposed algorithm achieves the same sublinear regret bound as the best Thompson sampling algorithms for a special case of contextual bandits.
arXiv Detail & Related papers (2022-06-22T17:58:23Z)
Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes [99.26864533035454]
We study offline reinforcement learning (RL) in partially observable Markov decision processes. We propose the underlineProxy variable underlinePessimistic underlinePolicy underlineOptimization (textttP3O) algorithm. textttP3O is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.
arXiv Detail & Related papers (2022-05-26T19:13:55Z)
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game [140.19656665344917]
We study the reward-free RL problem, where an agent aims to thoroughly explore the environment without any pre-specified reward function. We tackle this problem under the context of function approximation, leveraging powerful function approximators. We establish the first provably efficient reward-free RL algorithm with kernel and neural function approximators.
arXiv Detail & Related papers (2021-10-19T07:26:33Z)
Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes. Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z)
Learning Representations using Spectral-Biased Random Walks on Graphs [18.369974607582584]
We study how much a probabilistic bias in this process affects the quality of the nodes picked by the process. We succinctly capture this neighborhood as a probability measure based on the spectrum of the node's neighborhood subgraph represented as a normalized laplacian matrix. We empirically evaluate our approach against several state-of-the-art node embedding techniques on a wide variety of real-world datasets.
arXiv Detail & Related papers (2020-05-19T20:42:43Z)
Bayesian optimization for backpropagation in Monte-Carlo tree search [1.52292571922932]
We present two methods, Softmax MCTS and Monotone MCTS, which generalize previous attempts to improve upon the backpropagation strategy. We conduct experiments on computer Go, where the returns are given by a deep value neural network, and show that our proposed framework outperforms previous methods.
arXiv Detail & Related papers (2020-01-25T14:33:38Z)
Bayesian Quantile and Expectile Optimisation [3.3878745408530833]
We propose new variational models for Bayesian quantile and expectile regression that are well-suited for heteroscedastic noise settings. Our strategies can directly optimise for the quantile and expectile, without requiring replicating observations or assuming a parametric form for the noise. As illustrated in the experimental section, the proposed approach clearly outperforms the state of the art in the heteroscedastic, non-Gaussian case.
arXiv Detail & Related papers (2020-01-12T20:51:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.