Primal-dual regression approach for Markov decision processes with
general state and action space
- URL: http://arxiv.org/abs/2210.00258v2
- Date: Tue, 4 Oct 2022 16:41:09 GMT
- Title: Primal-dual regression approach for Markov decision processes with
general state and action space
- Authors: Denis Belomestny and John Schoenmakers
- Abstract summary: We develop a regression based primal-dual martingale approach for solving finite time MDPs with general state and action space.
As a result, our method allows for the construction of tight upper and lower biased approximations of the value functions, and, provides tight approximations to the optimal policy.
- Score: 0.30458514384586394
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We develop a regression based primal-dual martingale approach for solving
finite time horizon MDPs with general state and action space. As a result, our
method allows for the construction of tight upper and lower biased
approximations of the value functions, and, provides tight approximations to
the optimal policy. In particular, we prove tight error bounds for the
estimated duality gap featuring polynomial dependence on the time horizon, and
sublinear dependence on the cardinality/dimension of the possibly infinite
state and action space.From a computational point of view the proposed method
is efficient since, in contrast to usual duality-based methods for optimal
control problems in the literature, the Monte Carlo procedures here involved do
not require nested simulations.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Deterministic Policy Gradient Primal-Dual Methods for Continuous-Space Constrained MDPs [82.34567890576423]
We develop a deterministic policy gradient primal-dual method to find an optimal deterministic policy with non-asymptotic convergence.
We prove that the primal-dual iterates of D-PGPD converge at a sub-linear rate to an optimal regularized primal-dual pair.
To the best of our knowledge, this appears to be the first work that proposes a deterministic policy search method for continuous-space constrained MDPs.
arXiv Detail & Related papers (2024-08-19T14:11:04Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs [21.347689976296834]
We employ the natural policy gradient method to solve the discounted optimal optimal rate problem.
We also provide convergence and finite-sample guarantees for two sample-based NPG-PD algorithms.
arXiv Detail & Related papers (2022-06-06T04:28:04Z) - Algorithm for Constrained Markov Decision Process with Linear
Convergence [55.41644538483948]
An agent aims to maximize the expected accumulated discounted reward subject to multiple constraints on its costs.
A new dual approach is proposed with the integration of two ingredients: entropy regularized policy and Vaidya's dual.
The proposed approach is shown to converge (with linear rate) to the global optimum.
arXiv Detail & Related papers (2022-06-03T16:26:38Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Near Optimality of Finite Memory Feedback Policies in Partially Observed
Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known.
We find optimal policies for the approximate belief model under mild non-linear filter stability conditions.
We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z) - Kernel Taylor-Based Value Function Approximation for Continuous-State
Markov Decision Processes [5.894659354028797]
We propose a principled kernel-based policy iteration algorithm to solve the continuous-state Markov Decision Processes (MDPs)
We have validated the proposed method through extensive simulations in both simplified and realistic planning scenarios.
arXiv Detail & Related papers (2020-06-03T01:48:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.