Computing the Reachability Value of Posterior-Deterministic POMDPs
- URL: http://arxiv.org/abs/2602.07473v1
- Date: Sat, 07 Feb 2026 10:09:41 GMT
- Title: Computing the Reachability Value of Posterior-Deterministic POMDPs
- Authors: Nathanaël Fijalkow, Arka Ghosh, Roman Kniazev, Guillermo A. Pérez, Pierre Vandenhove,
- Abstract summary: We introduce posterior-deterministic POMDPs, a novel class of POMDPs.<n>We show that for posterior-deterministic POMDPs, the maximal probability of reaching a given set of states can be approximated up to arbitrary precision.
- Score: 10.025098857952628
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Partially observable Markov decision processes (POMDPs) are a fundamental model for sequential decision-making under uncertainty. However, many verification and synthesis problems for POMDPs are undecidable or intractable. Most prominently, the seminal result of Madani et al. (2003) states that there is no algorithm that, given a POMDP and a set of target states, can compute the maximal probability of reaching the target states, or even approximate it up to a non-trivial constant. This is in stark contrast to fully observable Markov decision processes (MDPs), where the reachability value can be computed in polynomial time. In this work, we introduce posterior-deterministic POMDPs, a novel class of POMDPs. Our main technical contribution is to show that for posterior-deterministic POMDPs, the maximal probability of reaching a given set of states can be approximated up to arbitrary precision. A POMDP is posterior-deterministic if the next state can be uniquely determined by the current state, the action taken, and the observation received. While the actual state is generally uncertain in POMDPs, the posterior-deterministic property tells us that once the true state is known it remains known forever. This simple and natural definition includes all MDPs and captures classical non-trivial examples such as the Tiger POMDP (Kaelbling et al. 1998), making it one of the largest known classes of POMDPs for which the reachability value can be approximated.
Related papers
- Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability [29.63953552645502]
Multi-environment POMDPs (ME-POMDPs) extend standard POMDPs with discrete model uncertainty.<n>We show that ME-POMDPs can be generalized to POMDPs with sets of initial beliefs.<n>We then devise exact and approximate (point-based) algorithms to compute robust policies for AB-POMDPs.
arXiv Detail & Related papers (2025-10-27T18:24:11Z) - Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - Online POMDP Planning with Anytime Deterministic Optimality Guarantees [13.824288326240927]
We derive a deterministic relationship for discrete POMDPs between an approximated and the optimal solution.<n>We show that our derivations provide an avenue for a new set of algorithms and can be attached to existing algorithms.
arXiv Detail & Related papers (2023-10-03T04:40:38Z) - Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z) - Optimality Guarantees for Particle Belief Approximation of POMDPs [55.83001584645448]
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems.
POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid.
We propose a theory characterizing the approximation error of the particle filtering techniques that these algorithms use.
arXiv Detail & Related papers (2022-10-10T21:11:55Z) - PAC Statistical Model Checking of Mean Payoff in Discrete- and
Continuous-Time MDP [0.34410212782758043]
We provide the first algorithm to compute mean payoff probably approximately correctly in unknown MDP.
We do not require any knowledge of the state space, only a lower bound on the minimum transition probability.
In addition to providing probably approximately correct (PAC) bounds for our algorithm, we also demonstrate its practical nature by running experiments on standard benchmarks.
arXiv Detail & Related papers (2022-06-03T09:13:27Z) - Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency [111.83670279016599]
We study reinforcement learning for partially observed decision processes (POMDPs) with infinite observation and state spaces.
We make the first attempt at partial observability and function approximation for a class of POMDPs with a linear structure.
arXiv Detail & Related papers (2022-04-20T21:15:38Z) - A Fully Polynomial Time Approximation Scheme for Constrained MDPs and
Stochastic Shortest Path under Local Transitions [2.512827436728378]
We study the structure of (C)C-MDP, particularly an important variant that involves local transition.
In this work, we propose a fully-time approximation scheme for (C)C-MDP that computes (near) optimal deterministic policies.
arXiv Detail & Related papers (2022-04-10T22:08:33Z) - Under-Approximating Expected Total Rewards in POMDPs [68.8204255655161]
We consider the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP)
We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well.
arXiv Detail & Related papers (2022-01-21T16:43:03Z) - Planning in Observable POMDPs in Quasipolynomial Time [21.03037504572896]
We develop a quasipolynomial-time algorithm for planning in observable POMDPs.
We assume that well-separated distributions on states lead to well-separated distributions on observations.
We prove matching hardness for planning in observable POMDPs under the Exponential Time Hypothesis.
arXiv Detail & Related papers (2022-01-12T23:16:37Z) - A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs [117.82903457289584]
We derive a novel problem-dependent lower-bound for regret in finite-horizon Markov Decision Processes (MDPs)
We show that our lower-bound is considerably smaller than in the general case and it does not scale with the minimum action gap at all.
We show that this last result is attainable (up to $poly(H)$ terms, where $H$ is the horizon) by providing a regret upper-bound based on policy gaps for an optimistic algorithm.
arXiv Detail & Related papers (2021-06-24T13:46:09Z) - RL for Latent MDPs: Regret Guarantees and a Lower Bound [74.41782017817808]
We consider the regret problem for reinforcement learning in latent Markov Decision Processes (LMDP)
In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent.
We show that the key link is a notion of separation between the MDP system dynamics.
arXiv Detail & Related papers (2021-02-09T16:49:58Z) - Near Optimality of Finite Memory Feedback Policies in Partially Observed
Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known.
We find optimal policies for the approximate belief model under mild non-linear filter stability conditions.
We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.