Scalable Solution Methods for Dec-POMDPs with Deterministic Dynamics
- URL: http://arxiv.org/abs/2508.21595v1
- Date: Fri, 29 Aug 2025 12:50:10 GMT
- Title: Scalable Solution Methods for Dec-POMDPs with Deterministic Dynamics
- Authors: Yang You, Alex Schutz, Zhikun Li, Bruno Lacerda, Robert Skilton, Nick Hawes,
- Abstract summary: We introduce the class of Deterministic Decentralized POMDPs (Det-Dec-POMDPs)<n>This is a subclass of Dec-POMDPs characterized by deterministic transitions and observations conditioned on the state and joint actions.<n>We then propose a practical solver called Iterative Deterministic POMDP Planning (IDPP)
- Score: 20.560809517043904
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many high-level multi-agent planning problems, including multi-robot navigation and path planning, can be effectively modeled using deterministic actions and observations. In this work, we focus on such domains and introduce the class of Deterministic Decentralized POMDPs (Det-Dec-POMDPs). This is a subclass of Dec-POMDPs characterized by deterministic transitions and observations conditioned on the state and joint actions. We then propose a practical solver called Iterative Deterministic POMDP Planning (IDPP). This method builds on the classic Joint Equilibrium Search for Policies framework and is specifically optimized to handle large-scale Det-Dec-POMDPs that current Dec-POMDP solvers are unable to address efficiently.
Related papers
- A Finite-State Controller Based Offline Solver for Deterministic POMDPs [18.518047404768378]
We propose DetMCVI, an adaptation of the Monte Carlo Value Iteration (MCVI) algorithm for DetPOMDPs.<n>DetMCVI solves large problems with a high success rate, outperforming existing baselines for DetPOMDPs.<n>We also verify the performance of the algorithm in a real-world mobile robot forest mapping scenario.
arXiv Detail & Related papers (2025-05-01T15:30:26Z) - Scalable Decision-Making in Stochastic Environments through Learned Temporal Abstraction [7.918703013303246]
We present Latent Macro Action Planner (L-MAP), which addresses the challenge of learning to make decisions in high-dimensional continuous action spaces.<n>L-MAP learns a set of temporally extended macro-actions through a state-conditional Vector Quantized Variational Autoencoder (VQ-VAE)<n>In offline RL settings, including continuous control tasks, L-MAP efficiently searches over discrete latent actions to yield high expected returns.
arXiv Detail & Related papers (2025-02-28T16:02:23Z) - Contextual Bilevel Reinforcement Learning for Incentive Alignment [42.22085862132403]
We introduce Contextual Bilevel Reinforcement Learning (CB-RL), a bilevel decision-making model.<n>CB-RL can be viewed as a Stackelberg Game where the leader and a random context beyond the leader's control together decide the setup of many MDPs.<n>This framework extends beyond traditional bilevel optimization and finds relevance in diverse fields such as reward shaping, contract theory and mechanism design.
arXiv Detail & Related papers (2024-06-03T17:54:39Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - Recursively-Constrained Partially Observable Markov Decision Processes [13.8724466775267]
We show that C-POMDPs violate the optimal substructure property over successive decision steps.
Online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation.
We introduce the Recursively-Constrained POMDP, which imposes additional history-dependent cost constraints on the C-POMDP.
arXiv Detail & Related papers (2023-10-15T00:25:07Z) - Probabilistic Planning with Partially Ordered Preferences over Temporal
Goals [22.77805882908817]
We study planning in Markov decision processes (MDPs) with preferences over temporally extended goals.
We introduce a variant of deterministic finite automaton, referred to as a preference DFA, for specifying the user's preferences over temporally extended goals.
We prove that a weak-stochastic nondominated policy given the preference specification is optimal in the constructed multi-objective MDP.
arXiv Detail & Related papers (2022-09-25T17:13:24Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Efficient Sampling in POMDPs with Lipschitz Bandits for Motion Planning
in Continuous Spaces [5.732271870257913]
Decision making under uncertainty can be framed as a partially observable Markov decision process (POMDP)
Finding exact solutions of POMDPs is generally intractable, but the solution can be approximated by sampling-based approaches.
We demonstrate the effectiveness of this approach in the context of motion planning for automated driving.
arXiv Detail & Related papers (2021-06-08T09:31:48Z) - Learning MDPs from Features: Predict-Then-Optimize for Sequential
Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning.
Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Decentralized MCTS via Learned Teammate Models [89.24858306636816]
We present a trainable online decentralized planning algorithm based on decentralized Monte Carlo Tree Search.
We show that deep learning and convolutional neural networks can be employed to produce accurate policy approximators.
arXiv Detail & Related papers (2020-03-19T13:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.