Anytime Probabilistically Constrained Provably Convergent Online Belief Space Planning
- URL: http://arxiv.org/abs/2411.06711v1
- Date: Mon, 11 Nov 2024 04:42:18 GMT
- Title: Anytime Probabilistically Constrained Provably Convergent Online Belief Space Planning
- Authors: Andrey Zhitnikov, Vadim Indelman,
- Abstract summary: We present an anytime approach employing the Monte Carlo Tree Search (MCTS) method in continuous domains.
We prove convergence in probability with an exponential rate of a version of our algorithms and study proposed techniques via extensive simulations.
- Score: 7.081396107231381
- License:
- Abstract: Taking into account future risk is essential for an autonomously operating robot to find online not only the best but also a safe action to execute. In this paper, we build upon the recently introduced formulation of probabilistic belief-dependent constraints. We present an anytime approach employing the Monte Carlo Tree Search (MCTS) method in continuous domains. Unlike previous approaches, our method assures safety anytime with respect to the currently expanded search tree without relying on the convergence of the search. We prove convergence in probability with an exponential rate of a version of our algorithms and study proposed techniques via extensive simulations. Even with a tiny number of tree queries, the best action found by our approach is much safer than the baseline. Moreover, our approach constantly finds better than the baseline action in terms of objective. This is because we revise the values and statistics maintained in the search tree and remove from them the contribution of the pruned actions.
Related papers
- Uncertainty-Guided Optimization on Large Language Model Search Trees [42.71167208999792]
Tree search algorithms such as greedy and beam search are the standard when it comes to finding sequences of maximum likelihood in the decoding processes of large language models (LLMs)
We define prior beliefs over LLMs' transition probabilities and obtain posterior beliefs over the most promising paths in each iteration.
Unlike expensive simulation-based non-myopic methods like the Monte Carlo tree search, our method only requires samples from the beliefs.
arXiv Detail & Related papers (2024-07-04T14:08:50Z) - Approximate Dec-POMDP Solving Using Multi-Agent A* [8.728372851272727]
We present an A*-based algorithm to compute policies for finite-horizon Dec-POMDPs.
Our goal is to sacrifice optimality in favor of scalability for larger horizons.
arXiv Detail & Related papers (2024-05-09T10:33:07Z) - Monte Carlo Tree Search with Boltzmann Exploration [16.06815496704043]
We introduce Boltzmann Tree Search (BTS) and Decaying ENtropy Tree-Search (DENTS)
Our algorithms show consistent high performance across several benchmark domains, including the game of Go.
arXiv Detail & Related papers (2024-04-11T13:25:35Z) - Probabilistic Reach-Avoid for Bayesian Neural Networks [71.67052234622781]
We show that an optimal synthesis algorithm can provide more than a four-fold increase in the number of certifiable states.
The algorithm is able to provide more than a three-fold increase in the average guaranteed reach-avoid probability.
arXiv Detail & Related papers (2023-10-03T10:52:21Z) - Online POMDP Planning with Anytime Deterministic Guarantees [11.157761902108692]
Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs)
Finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks.
We derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one.
arXiv Detail & Related papers (2023-10-03T04:40:38Z) - Latent State Marginalization as a Low-cost Approach for Improving
Exploration [79.12247903178934]
We propose the adoption of latent variable policies within the MaxEnt framework.
We show that latent variable policies naturally emerges under the use of world models with a latent belief state.
We experimentally validate our method on continuous control tasks, showing that effective marginalization can lead to better exploration and more robust training.
arXiv Detail & Related papers (2022-10-03T15:09:12Z) - Log Barriers for Safe Black-box Optimization with Application to Safe
Reinforcement Learning [72.97229770329214]
We introduce a general approach for seeking high dimensional non-linear optimization problems in which maintaining safety during learning is crucial.
Our approach called LBSGD is based on applying a logarithmic barrier approximation with a carefully chosen step size.
We demonstrate the effectiveness of our approach on minimizing violation in policy tasks in safe reinforcement learning.
arXiv Detail & Related papers (2022-07-21T11:14:47Z) - Reinforcement Learning for Branch-and-Bound Optimisation using
Retrospective Trajectories [72.15369769265398]
Machine learning has emerged as a promising paradigm for branching.
We propose retro branching; a simple yet effective approach to RL for branching.
We outperform the current state-of-the-art RL branching algorithm by 3-5x and come within 20% of the best IL method's performance on MILPs with 500 constraints and 1000 variables.
arXiv Detail & Related papers (2022-05-28T06:08:07Z) - CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning.
We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z) - Learning to be safe, in finite time [4.189643331553922]
This paper aims to put forward the concept that learning to take safe actions in unknown environments, even with probability one guarantees, can be achieved without the need for an unbounded number of exploratory trials.
We focus on the canonical multi-armed bandit problem and seek to study the exploration-preservation trade-off intrinsic within safe learning.
arXiv Detail & Related papers (2020-10-01T14:03:34Z) - Latent Bandits Revisited [55.88616813182679]
A latent bandit problem is one in which the learning agent knows the arm reward distributions conditioned on an unknown discrete latent state.
We propose general algorithms for this setting, based on both upper confidence bounds (UCBs) and Thompson sampling.
We provide a unified theoretical analysis of our algorithms, which have lower regret than classic bandit policies when the number of latent states is smaller than actions.
arXiv Detail & Related papers (2020-06-15T19:24:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.