Related papers: Efficient Solving of Large Single Input Superstate Decomposable Markovian Decision Process

Efficient Solving of Large Single Input Superstate Decomposable Markovian Decision Process

URL: http://arxiv.org/abs/2508.00816v1
Date: Fri, 01 Aug 2025 17:49:27 GMT
Title: Efficient Solving of Large Single Input Superstate Decomposable Markovian Decision Process
Authors: Youssef Ait El Mahjoub, Jean-Michel Fourneau, Salma Alouah,
Abstract summary: A key step in Bellman dynamic programming algorithms is the policy evaluation.<n>We develop an exact and efficient policy evaluation method based on this structure.<n>This yields a scalable solution applicable to both average and discounted reward MDPs.
Score: 1.17431678544333
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Solving Markov Decision Processes (MDPs) remains a central challenge in sequential decision-making, especially when dealing with large state spaces and long-term optimization criteria. A key step in Bellman dynamic programming algorithms is the policy evaluation, which becomes computationally demanding in infinite-horizon settings such as average-reward or discounted-reward formulations. In the context of Markov chains, aggregation and disaggregation techniques have for a long time been used to reduce complexity by exploiting structural decompositions. In this work, we extend these principles to a structured class of MDPs. We define the Single-Input Superstate Decomposable Markov Decision Process (SISDMDP), which combines Chiu's single-input decomposition with Robertazzi's single-cycle recurrence property. When a policy induces this structure, the resulting transition graph can be decomposed into interacting components with centralized recurrence. We develop an exact and efficient policy evaluation method based on this structure. This yields a scalable solution applicable to both average and discounted reward MDPs.

Related papers

Efficient Strategy Synthesis for MDPs via Hierarchical Block Decomposition [47.123254940289726]
Software product lines and robotics utilise Markov decision processes (MDPs) to capture uncertainty and analyse sequential decision-making problems.<n>Despite the usefulness of conventional policy synthesis methods, they fail to scale to large state spaces.<n>Our approach addresses this issue and accelerates policy synthesis in large MDPs by dynamically refining the MDP and iteratively selecting the most fragile MDP regions for refinement.
arXiv Detail & Related papers (2025-06-21T19:03:03Z)
Recursively-Constrained Partially Observable Markov Decision Processes [13.8724466775267]
We show that C-POMDPs violate the optimal substructure property over successive decision steps. Online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation. We introduce the Recursively-Constrained POMDP, which imposes additional history-dependent cost constraints on the C-POMDP.
arXiv Detail & Related papers (2023-10-15T00:25:07Z)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs) This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z)
Twice Regularized Markov Decision Processes: The Equivalence between Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics. Regularized MDPs show more stability in policy learning without impairing time complexity. Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z)
Policy Gradient for Rectangular Robust Markov Decision Processes [62.397882389472564]
We introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (MDPs) Our resulting RPG can be estimated from data with the same time complexity as its non-robust equivalent.
arXiv Detail & Related papers (2023-01-31T12:40:50Z)
First-order Policy Optimization for Robust Markov Decision Process [40.2022466644885]
We consider the problem of solving robust Markov decision process (MDP) MDP involves a set of discounted, finite state, finite action space MDPs with uncertain transition kernels. For $(mathbfs,mathbfa)$-rectangular uncertainty sets, we establish several structural observations on the robust objective.
arXiv Detail & Related papers (2022-09-21T18:10:28Z)
Efficient Policy Iteration for Robust Markov Decision Processes via Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known. Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs. In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z)
Twice regularized MDPs and the equivalence between robustness and regularization [65.58188361659073]
We show that policy iteration on reward-robust MDPs can have the same time complexity as on regularized MDPs. We generalize regularized MDPs to twice regularized MDPs.
arXiv Detail & Related papers (2021-10-12T18:33:45Z)
Goal Kernel Planning: Linearly-Solvable Non-Markovian Policies for Logical Tasks with Goal-Conditioned Options [54.40780660868349]
We introduce a compositional framework called Linearly-Solvable Goal Kernel Dynamic Programming (LS-GKDP)<n>LS-GKDP combines the Linearly-Solvable Markov Decision Process (LMDP) formalism with the Options Framework of Reinforcement Learning.<n>We show how an LMDP with a goal kernel enables the efficient optimization of meta-policies in a lower-dimensional subspace defined by the task grounding.
arXiv Detail & Related papers (2020-07-06T05:13:20Z)
Partial Policy Iteration for L1-Robust Markov Decision Processes [13.555107578858307]
This paper describes new efficient algorithms for solving the common class of robust MDPs. We propose partial policy iteration, a new, efficient, flexible, and general policy iteration scheme for robust MDPs. Our experimental results indicate that the proposed methods are many orders of magnitude faster than the state-of-the-art approach.
arXiv Detail & Related papers (2020-06-16T19:50:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.