Quantum Markov Decision Processes Part II: Optimal Solutions and
Algorithms
- URL: http://arxiv.org/abs/2402.14651v1
- Date: Thu, 22 Feb 2024 16:00:21 GMT
- Title: Quantum Markov Decision Processes Part II: Optimal Solutions and
Algorithms
- Authors: Naci Saldi, Sina Sanjari, and Serdar Yuksel
- Abstract summary: This two-part article aims to introduce a quantum analogue to classical Markov decision processes (MDPs)
In Part II, our focus shifts to the development of algorithms for computing optimal policies and value functions of both open-loop and closed-loop policies.
- Score: 1.8775413720750924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This two-part article aims to introduce a quantum analogue to classical
Markov decision processes (MDPs). In Part II, building on the formulation of
q-MDPs presented in Part I, our focus shifts to the development of algorithms
for computing optimal policies and value functions of both open-loop and
closed-loop policies. First, by using the duality between the dynamic
programming and the semi-definite programming formulations of any q-MDP with
open-loop policies, we establish an algorithm that enables us to efficiently
compute optimal open-loop quantum policies and value functions. Then, dynamic
programming and semi-definite programming formulations for closed-loop policies
is established, where duality of these two formulations similarly enables the
efficient computation of optimal closed-loop policies and value functions.
Finally, given that any q-MDP can be approximated by q-MDPs with classical
policies--potentially with higher-dimensional underlying Hilbert spaces than
the original model--and since any classical policy is an element of the set of
closed-loop policies, we conclude that any q-MDP can be approximated by q-MDPs
with closed-loop policies having higher-dimensional Hilbert spaces.
Related papers
- Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Two-Stage ML-Guided Decision Rules for Sequential Decision Making under Uncertainty [55.06411438416805]
Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many domains such as energy, finance, and supply chains.
Some SDMU are naturally modeled as Multistage Problems (MSPs) but the resulting optimizations are notoriously challenging from a computational standpoint.
This paper introduces a novel approach Two-Stage General Decision Rules (TS-GDR) to generalize the policy space beyond linear functions.
The effectiveness of TS-GDR is demonstrated through an instantiation using Deep Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-LDR)
arXiv Detail & Related papers (2024-05-23T18:19:47Z) - Quantum Markov Decision Processes: General Theory, Approximations, and Classes of Policies [1.8775413720750924]
We present a novel quantum MDP model aiming to introduce a new framework, algorithms, and future research avenues.
We hope that our approach will pave the way for a new research direction in discrete-time quantum control.
arXiv Detail & Related papers (2024-02-22T15:59:09Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Risk-Averse Decision Making Under Uncertainty [18.467950783426947]
A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs)
In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures.
arXiv Detail & Related papers (2021-09-09T07:52:35Z) - Strengthening Deterministic Policies for POMDPs [5.092711491848192]
We provide a novel MILP encoding that supports sophisticated specifications in the form of temporal logic constraints.
We employ a preprocessing of the POMDP to encompass memory-based decisions.
The advantages of our approach lie (1) in the flexibility to strengthen simple deterministic policies without losing computational tractability and (2) in the ability to enforce the provable satisfaction of arbitrarily many specifications.
arXiv Detail & Related papers (2020-07-16T14:22:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.