Convergence and sample complexity of natural policy gradient primal-dual
methods for constrained MDPs
- URL: http://arxiv.org/abs/2206.02346v2
- Date: Tue, 17 Oct 2023 04:02:54 GMT
- Title: Convergence and sample complexity of natural policy gradient primal-dual
methods for constrained MDPs
- Authors: Dongsheng Ding, Kaiqing Zhang, Jiali Duan, Tamer Ba\c{s}ar, Mihailo R.
Jovanovi\'c
- Abstract summary: We employ the natural policy gradient method to solve the discounted optimal optimal rate problem.
We also provide convergence and finite-sample guarantees for two sample-based NPG-PD algorithms.
- Score: 24.582720609592464
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study sequential decision making problems aimed at maximizing the expected
total reward while satisfying a constraint on the expected total utility. We
employ the natural policy gradient method to solve the discounted
infinite-horizon optimal control problem for Constrained Markov Decision
Processes (constrained MDPs). Specifically, we propose a new Natural Policy
Gradient Primal-Dual (NPG-PD) method that updates the primal variable via
natural policy gradient ascent and the dual variable via projected sub-gradient
descent. Although the underlying maximization involves a nonconcave objective
function and a nonconvex constraint set, under the softmax policy
parametrization we prove that our method achieves global convergence with
sublinear rates regarding both the optimality gap and the constraint violation.
Such convergence is independent of the size of the state-action space, i.e., it
is~dimension-free. Furthermore, for log-linear and general smooth policy
parametrizations, we establish sublinear convergence rates up to a function
approximation error caused by restricted policy parametrization. We also
provide convergence and finite-sample complexity guarantees for two
sample-based NPG-PD algorithms. Finally, we use computational experiments to
showcase the merits and the effectiveness of our approach.
Related papers
- Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - Last-Iterate Convergent Policy Gradient Primal-Dual Methods for
Constrained MDPs [107.28031292946774]
We study the problem of computing an optimal policy of an infinite-horizon discounted Markov decision process (constrained MDP)
We develop two single-time-scale policy-based primal-dual algorithms with non-asymptotic convergence of their policy iterates to an optimal constrained policy.
To the best of our knowledge, this work appears to be the first non-asymptotic policy last-iterate convergence result for single-time-scale algorithms in constrained MDPs.
arXiv Detail & Related papers (2023-06-20T17:27:31Z) - High-probability sample complexities for policy evaluation with linear function approximation [88.87036653258977]
We investigate the sample complexities required to guarantee a predefined estimation error of the best linear coefficients for two widely-used policy evaluation algorithms.
We establish the first sample complexity bound with high-probability convergence guarantee that attains the optimal dependence on the tolerance level.
arXiv Detail & Related papers (2023-05-30T12:58:39Z) - Linear Convergence of Natural Policy Gradient Methods with Log-Linear
Policies [115.86431674214282]
We consider infinite-horizon discounted Markov decision processes and study the convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log-linear policy class.
We show that both methods attain linear convergence rates and $mathcalO (1/epsilon2)$ sample complexities using a simple, non-adaptive geometrically increasing step size.
arXiv Detail & Related papers (2022-10-04T06:17:52Z) - On the Convergence Rates of Policy Gradient Methods [9.74841674275568]
We consider geometrically discounted dominance problems with finite state sub spaces.
We show that with direct gradient pararization in a sample we can analyze the general complexity of a gradient.
arXiv Detail & Related papers (2022-01-19T07:03:37Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Beyond Exact Gradients: Convergence of Stochastic Soft-Max Policy Gradient Methods with Entropy Regularization [20.651913793555163]
We revisit the classical entropy regularized policy gradient methods with the soft-max policy parametrization.
We establish a global optimality convergence result and a sample complexity of $widetildemathcalO(frac1epsilon2)$ for the proposed algorithm.
arXiv Detail & Related papers (2021-10-19T17:21:09Z) - Conditional gradient methods for stochastically constrained convex
minimization [54.53786593679331]
We propose two novel conditional gradient-based methods for solving structured convex optimization problems.
The most important feature of our framework is that only a subset of the constraints is processed at each iteration.
Our algorithms rely on variance reduction and smoothing used in conjunction with conditional gradient steps, and are accompanied by rigorous convergence guarantees.
arXiv Detail & Related papers (2020-07-07T21:26:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.