A Fully Polynomial Time Approximation Scheme for Constrained MDPs and
Stochastic Shortest Path under Local Transitions
- URL: http://arxiv.org/abs/2204.04780v2
- Date: Tue, 18 Apr 2023 17:16:33 GMT
- Title: A Fully Polynomial Time Approximation Scheme for Constrained MDPs and
Stochastic Shortest Path under Local Transitions
- Authors: Majid Khonji
- Abstract summary: We study the structure of (C)C-MDP, particularly an important variant that involves local transition.
In this work, we propose a fully-time approximation scheme for (C)C-MDP that computes (near) optimal deterministic policies.
- Score: 2.512827436728378
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The fixed-horizon constrained Markov Decision Process (C-MDP) is a well-known
model for planning in stochastic environments under operating constraints.
Chance-Constrained MDP (CC-MDP) is a variant that allows bounding the
probability of constraint violation, which is desired in many safety-critical
applications. CC-MDP can also model a class of MDPs, called Stochastic Shortest
Path (SSP), under dead-ends, where there is a trade-off between the
probability-to-goal and cost-to-goal. This work studies the structure of
(C)C-MDP, particularly an important variant that involves local transition. In
this variant, the state reachability exhibits a certain degree of locality and
independence from the remaining states. More precisely, the number of states,
at a given time, that share some reachable future states is always constant.
(C)C-MDP under local transition is NP-Hard even for a planning horizon of two.
In this work, we propose a fully polynomial-time approximation scheme for
(C)C-MDP that computes (near) optimal deterministic policies. Such an algorithm
is among the best approximation algorithm attainable in theory and gives
insights into the approximability of constrained MDP and its variants.
Related papers
- Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs)
In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z) - Towards Instance-Optimality in Online PAC Reinforcement Learning [28.156332484814616]
We propose the first instance-dependent lower bound on the sample complexity required for the PAC identification of a near-optimal policy.
We demonstrate that the sample complexity of the PEDEL algorithm of citeWagenmaker22linearMDP closely approaches this lower bound.
arXiv Detail & Related papers (2023-10-31T19:26:36Z) - Recursively-Constrained Partially Observable Markov Decision Processes [13.8724466775267]
We show that C-POMDPs violate the optimal substructure property over successive decision steps.
Online re-planning in C-POMDPs is often ineffective due to the inconsistency resulting from this violation.
We introduce the Recursively-Constrained POMDP, which imposes additional history-dependent cost constraints on the C-POMDP.
arXiv Detail & Related papers (2023-10-15T00:25:07Z) - Twice Regularized Markov Decision Processes: The Equivalence between
Robustness and Regularization [64.60253456266872]
Markov decision processes (MDPs) aim to handle changing or partially known system dynamics.
Regularized MDPs show more stability in policy learning without impairing time complexity.
Bellman operators enable us to derive planning and learning schemes with convergence and generalization guarantees.
arXiv Detail & Related papers (2023-03-12T13:03:28Z) - Optimality Guarantees for Particle Belief Approximation of POMDPs [55.83001584645448]
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems.
POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid.
We propose a theory characterizing the approximation error of the particle filtering techniques that these algorithms use.
arXiv Detail & Related papers (2022-10-10T21:11:55Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Common Information based Approximate State Representations in
Multi-Agent Reinforcement Learning [3.086462790971422]
We develop a general compression framework with approximate common and private state representations, based on which decentralized policies can be constructed.
The results shed light on designing practically useful deep-MARL network structures under the "centralized learning distributed execution" scheme.
arXiv Detail & Related papers (2021-10-25T02:32:06Z) - Twice regularized MDPs and the equivalence between robustness and
regularization [65.58188361659073]
We show that policy iteration on reward-robust MDPs can have the same time complexity as on regularized MDPs.
We generalize regularized MDPs to twice regularized MDPs.
arXiv Detail & Related papers (2021-10-12T18:33:45Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Efficient Planning in Large MDPs with Weak Linear Function Approximation [4.56877715768796]
Large-scale decision processes (MDPs) require planning algorithms independent of the number of states of the MDP.
We consider the planning problem in MDPs using linear value function approximation with only weak requirements.
arXiv Detail & Related papers (2020-07-13T04:40:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.