Related papers: Value Gradients with Action Adaptive Search Trees in Continuous (PO)MDPs

Value Gradients with Action Adaptive Search Trees in Continuous (PO)MDPs

URL: http://arxiv.org/abs/2503.12181v1
Date: Sat, 15 Mar 2025 15:51:06 GMT
Title: Value Gradients with Action Adaptive Search Trees in Continuous (PO)MDPs
Authors: Idan Lev-Yehudi, Michael Novitsky, Moran Barenboim, Ron Benchetrit, Vadim Indelman,
Abstract summary: Solving POMDPs in continuous state, action and observation spaces is key for autonomous planning in real-world mobility and robotics applications.<n>We formulate a novel Multiple Importance Sampling tree for value estimation, that allows to share value information between sibling action branches.<n>Second, we propose a novel methodology to compute value gradients with online sampling based on transition likelihoods.
Score: 7.170248667518935
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Solving Partially Observable Markov Decision Processes (POMDPs) in continuous state, action and observation spaces is key for autonomous planning in many real-world mobility and robotics applications. Current approaches are mostly sample based, and cannot hope to reach near-optimal solutions in reasonable time. We propose two complementary theoretical contributions. First, we formulate a novel Multiple Importance Sampling (MIS) tree for value estimation, that allows to share value information between sibling action branches. The novel MIS tree supports action updates during search time, such as gradient-based updates. Second, we propose a novel methodology to compute value gradients with online sampling based on transition likelihoods. It is applicable to MDPs, and we extend it to POMDPs via particle beliefs with the application of the propagated belief trick. The gradient estimator is computed in practice using the MIS tree with efficient Monte Carlo sampling. These two parts are combined into a new planning algorithm Action Gradient Monte Carlo Tree Search (AGMCTS). We demonstrate in a simulated environment its applicability, advantages over continuous online POMDP solvers that rely solely on sampling, and we discuss further implications.

Related papers

Partially Observable Monte-Carlo Graph Search [15.40087235187116]
We propose a new sampling-based algorithm, the partially observable Monte-Carlo graph search (POMCGS) to solve large POMDPs offline.<n>POMCGS folds this search tree on the fly to construct a policy graph, so that computations can be drastically reduced.<n>We demonstrate that POMCGS can generate policies on the most challenging POMDPs, which cannot be computed by previous offline algorithms.
arXiv Detail & Related papers (2025-07-28T16:02:36Z)
Monte Carlo Tree Diffusion for System 2 Planning [57.50512800900167]
We introduce Monte Carlo Tree Diffusion (MCTD), a novel framework that integrates the generative strength of diffusion models with the adaptive search capabilities of Monte Carlo Tree Search (MCTS)<n>Our method reconceptualizes denoising as a tree-structured process, allowing partially denoised plans to be iteratively evaluated, pruned, and refined.
arXiv Detail & Related papers (2025-02-11T02:51:42Z)
A View of the Certainty-Equivalence Method for PAC RL as an Application of the Trajectory Tree Method [5.238591085233903]
This paper presents a theoretical investigation that stems from the surprising finding that CEM may indeed be viewed as an application of TTM.<n>We obtain (3) improvements in the sample-complexity upper bounds for CEM both for non-stationary and stationary MDPs.<n>We also show (4) a lower bound on the sample complexity for finite-horizon MDPs, which establishes the minimax-optimality of our upper bound for non-stationary MDPs.
arXiv Detail & Related papers (2025-01-05T20:37:34Z)
Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy. We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z)
MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation [80.47072100963017]
We introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP) MAP efficiently identifies a set of scaling coefficients for merging multiple models, reflecting the trade-offs involved. We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.
arXiv Detail & Related papers (2024-06-11T17:55:25Z)
Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process. We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals. The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z)
Let's reward step by step: Step-Level reward model as the Navigators for Reasoning [64.27898739929734]
Process-Supervised Reward Model (PRM) furnishes LLMs with step-by-step feedback during the training phase. We propose a greedy search algorithm that employs the step-level feedback from PRM to optimize the reasoning pathways explored by LLMs. To explore the versatility of our approach, we develop a novel method to automatically generate step-level reward dataset for coding tasks and observed similar improved performance in the code generation tasks.
arXiv Detail & Related papers (2023-10-16T05:21:50Z)
Rollout Heuristics for Online Stochastic Contingent Planning [6.185979230964809]
Partially Observable Monte-Carlo Planning is an online algorithm for deciding on the next action to perform. POMDP is highly dependent on the rollout policy to compute good estimates. In this paper, we model POMDPs as contingent planning problems.
arXiv Detail & Related papers (2023-10-03T18:24:47Z)
Maximize to Explore: One Objective Function Fusing Estimation, Planning, and Exploration [87.53543137162488]
We propose an easy-to-implement online reinforcement learning (online RL) framework called textttMEX. textttMEX integrates estimation and planning components while balancing exploration exploitation automatically. It can outperform baselines by a stable margin in various MuJoCo environments with sparse rewards.
arXiv Detail & Related papers (2023-05-29T17:25:26Z)
A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees [5.250288418639076]
We propose an online POMDP solver called Lazy Cross-Entropy Search Over Policy Trees (LCEOPT) At each planning step, our method uses a novel lazy Cross-Entropy method to search the space of policy trees. Our method is surprisingly simple as compared to existing state-of-the-art methods, yet empirically outperforms them on several continuous-action POMDP problems.
arXiv Detail & Related papers (2023-05-14T03:12:53Z)
Adaptive Discretization using Voronoi Trees for Continuous POMDPs [7.713622698801596]
We propose a new sampling-based online POMDP solver, called Adaptive Discretization using Voronoi Trees (ADVT) It uses Monte Carlo Tree Search in combination with an adaptive discretization of the action space as well as optimistic optimization to efficiently sample high-dimensional continuous action spaces. ADVT scales substantially better to high-dimensional continuous action spaces, compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-02-21T04:47:34Z)
Optimality Guarantees for Particle Belief Approximation of POMDPs [55.83001584645448]
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid. We propose a theory characterizing the approximation error of the particle filtering techniques that these algorithms use.
arXiv Detail & Related papers (2022-10-10T21:11:55Z)
Continuous Monte Carlo Graph Search [61.11769232283621]
Continuous Monte Carlo Graph Search ( CMCGS) is an extension of Monte Carlo Tree Search (MCTS) to online planning. CMCGS takes advantage of the insight that, during planning, sharing the same action policy between several states can yield high performance. It can be scaled up through parallelization, and it outperforms the Cross-Entropy Method (CEM) in continuous control with learned dynamics models.
arXiv Detail & Related papers (2022-10-04T07:34:06Z)
Principal Geodesic Analysis of Merge Trees (and Persistence Diagrams) [8.430851504111585]
We introduce an efficient, iterative algorithm which exploits shared-memory parallelism, as well as an analytic expression of the fitting energy gradient. We show the utility of our contributions by extending to merge trees two typical PCA applications. We present a dimensionality reduction framework exploiting the first two directions of the MT-PGA basis to generate two-dimensional layouts.
arXiv Detail & Related papers (2022-07-22T09:17:22Z)
Adaptive Sampling using POMDPs with Domain-Specific Considerations [9.670635276589248]
We investigate improving Monte Carlo Tree Search based solvers for adaptive sampling problems. We propose improvements in rollout allocation, the action exploration algorithm, and plan commitment. We show that it is possible to greatly reduce the number of rollouts by increasing the number of actions taken from a single planning tree.
arXiv Detail & Related papers (2021-09-23T19:00:02Z)
Monte Carlo Information-Oriented Planning [6.0158981171030685]
We discuss how to solve information-gathering problems expressed as rho-POMDPs. We build on the POMCP algorithm to propose a Monte Carlo Tree Search for rho-POMDPs.
arXiv Detail & Related papers (2021-03-21T09:09:27Z)
Modular Deep Reinforcement Learning for Continuous Motion Planning with Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP) The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP. The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z)
A Rigorous Link Between Self-Organizing Maps and Gaussian Mixture Models [78.6363825307044]
This work presents a mathematical treatment of the relation between Self-Organizing Maps (SOMs) and Gaussian Mixture Models (GMMs) We show that energy-based SOM models can be interpreted as performing gradient descent. This link allows to treat SOMs as generative probabilistic models, giving a formal justification for using SOMs to detect outliers, or for sampling.
arXiv Detail & Related papers (2020-09-24T14:09:04Z)
Planning in Markov Decision Processes with Gap-Dependent Sample Complexity [48.98199700043158]
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process. We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability.
arXiv Detail & Related papers (2020-06-10T15:05:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.