Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic
Algorithm for Constrained Markov Decision Processes
- URL: http://arxiv.org/abs/2110.11383v1
- Date: Thu, 21 Oct 2021 18:05:40 GMT
- Title: Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic
Algorithm for Constrained Markov Decision Processes
- Authors: Sihan Zeng, Thinh T. Doan, Justin Romberg
- Abstract summary: We study an online primal-dual actor-critic method to solve a discounted cost constrained Markov decision process problem.
This paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem.
- Score: 22.07834608976826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a discounted cost constrained Markov decision process (CMDP)
policy optimization problem, in which an agent seeks to maximize a discounted
cumulative reward subject to a number of constraints on discounted cumulative
utilities. To solve this constrained optimization program, we study an online
actor-critic variant of a classic primal-dual method where the gradients of
both the primal and dual functions are estimated using samples from a single
trajectory generated by the underlying time-varying Markov processes. This
online primal-dual natural actor-critic algorithm maintains and iteratively
updates three variables: a dual variable (or Lagrangian multiplier), a primal
variable (or actor), and a critic variable used to estimate the gradients of
both primal and dual variables. These variables are updated simultaneously but
on different time scales (using different step sizes) and they are all
intertwined with each other. Our main contribution is to derive a finite-time
analysis for the convergence of this algorithm to the global optimum of a CMDP
problem. Specifically, we show that with a proper choice of step sizes the
optimality gap and constraint violation converge to zero in expectation at a
rate $\mathcal{O}(1/K^{1/6})$, where K is the number of iterations. To our
knowledge, this paper is the first to study the finite-time complexity of an
online primal-dual actor-critic method for solving a CMDP problem. We also
validate the effectiveness of this algorithm through numerical simulations.
Related papers
- Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms [5.945710235932345]
We consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes.
We carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting.
We also show the results of experiments on three different Safety-Gym environments.
arXiv Detail & Related papers (2023-10-25T05:04:00Z) - Statistical Inference of Constrained Stochastic Optimization via Sketched Sequential Quadratic Programming [53.63469275932989]
We consider online statistical inference of constrained nonlinear optimization problems.
We apply the Sequential Quadratic Programming (StoSQP) method to solve these problems.
arXiv Detail & Related papers (2022-05-27T00:34:03Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - A Two-Time-Scale Stochastic Optimization Framework with Applications in Control and Reinforcement Learning [13.908826484332282]
We study a new two-time-scale gradient method for solving optimization problems.
Our first contribution is to characterize the finite-time complexity of the proposed two-time-scale gradient algorithm.
We apply our framework to gradient-based policy evaluation algorithms in reinforcement learning.
arXiv Detail & Related papers (2021-09-29T23:15:23Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Single-Timescale Stochastic Nonconvex-Concave Optimization for Smooth
Nonlinear TD Learning [145.54544979467872]
We propose two single-timescale single-loop algorithms that require only one data point each step.
Our results are expressed in a form of simultaneous primal and dual side convergence.
arXiv Detail & Related papers (2020-08-23T20:36:49Z) - Private Stochastic Non-Convex Optimization: Adaptive Algorithms and
Tighter Generalization Bounds [72.63031036770425]
We propose differentially private (DP) algorithms for bound non-dimensional optimization.
We demonstrate two popular deep learning methods on the empirical advantages over standard gradient methods.
arXiv Detail & Related papers (2020-06-24T06:01:24Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z) - Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints [38.2783003051101]
This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1.
We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied.
arXiv Detail & Related papers (2020-03-11T23:23:29Z) - A New Randomized Primal-Dual Algorithm for Convex Optimization with
Optimal Last Iterate Rates [16.54912614895861]
We develop a novel unified randomized block-coordinate primal-dual algorithm to solve a class of nonsmooth constrained convex optimization problems.
We prove that our algorithm achieves optimal convergence rates in two cases: general convexity and strong convexity.
Our results show that the proposed method has encouraging performance on different experiments.
arXiv Detail & Related papers (2020-03-03T03:59:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.