Related papers: Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms

Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms

URL: http://arxiv.org/abs/2310.16363v3
Date: Wed, 29 May 2024 05:54:19 GMT
Title: Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms
Authors: Prashansa Panda, Shalabh Bhatnagar,
Abstract summary: We consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes. We carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting. We also show the results of experiments on three different Safety-Gym environments.
Score: 5.945710235932345
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Actor Critic methods have found immense applications on a wide range of Reinforcement Learning tasks especially when the state-action space is large. In this paper, we consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes (C-MDP) involving inequality constraints and carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting. We consider the long-run average cost criterion where both the objective and the constraint functions are suitable policy-dependent long-run averages of certain prescribed cost functions. We handle the inequality constraints using the Lagrange multiplier method. We prove that these algorithms are guaranteed to find a first-order stationary point (i.e., $\Vert \nabla L(\theta,\gamma)\Vert_2^2 \leq \epsilon$) of the performance (Lagrange) function $L(\theta,\gamma)$, with a sample complexity of $\mathcal{\tilde{O}}(\epsilon^{-2.5})$ in the case of both Constrained Actor Critic (C-AC) and Constrained Natural Actor Critic (C-NAC) algorithms. We also show the results of experiments on three different Safety-Gym environments.

Related papers

Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation [5.945710235932345]
We present the first two-timescale critic-actor algorithm with function approximation in the long-run average reward setting. A notable feature of our analysis is that unlike recent single-timescale actor-critic algorithms, we present a complete convergence analysis of our scheme.
arXiv Detail & Related papers (2024-02-02T12:48:49Z)
Quantum Goemans-Williamson Algorithm with the Hadamard Test and Approximate Amplitude Constraints [62.72309460291971]
We introduce a variational quantum algorithm for Goemans-Williamson algorithm that uses only $n+1$ qubits. Efficient optimization is achieved by encoding the objective matrix as a properly parameterized unitary conditioned on an auxilary qubit. We demonstrate the effectiveness of our protocol by devising an efficient quantum implementation of the Goemans-Williamson algorithm for various NP-hard problems.
arXiv Detail & Related papers (2022-06-30T03:15:23Z)
Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes [13.908826484332282]
We study an online primal-dual actor-critic method to solve a discounted cost constrained Markov decision process problem. This paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem.
arXiv Detail & Related papers (2021-10-21T18:05:40Z)
Finite-Sample Analysis of Off-Policy Natural Actor-Critic with Linear Function Approximation [5.543220407902113]
We develop a novel variant of off-policy natural actor-critic algorithm with linear function approximation. We establish a sample complexity of $mathcalO(epsilon-3)$, outperforming all the previously known convergence bounds of such algorithms.
arXiv Detail & Related papers (2021-05-26T13:35:42Z)
Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm [4.932130498861987]
We provide finite-sample convergence guarantees for an off-policy variant of the natural actor-critic (NAC) algorithm based on Importance Sampling. We show that the algorithm converges to a global optimal policy with a sample complexity of $mathcalO(epsilon-3log2(1/epsilon)$ under an appropriate choice of stepsizes.
arXiv Detail & Related papers (2021-02-18T13:22:59Z)
Average-Reward Off-Policy Policy Evaluation with Function Approximation [66.67075551933438]
We consider off-policy policy evaluation with function approximation in average-reward MDPs. bootstrapping is necessary and, along with off-policy learning and FA, results in the deadly triad. We propose two novel algorithms, reproducing the celebrated success of Gradient TD algorithms in the average-reward setting.
arXiv Detail & Related papers (2021-01-08T00:43:04Z)
Large-Scale Methods for Distributionally Robust Optimization [53.98643772533416]
We prove that our algorithms require a number of evaluations gradient independent of training set size and number of parameters. Experiments on MNIST and ImageNet confirm the theoretical scaling of our algorithms, which are 9--36 times more efficient than full-batch methods.
arXiv Detail & Related papers (2020-10-12T17:41:44Z)
A Two-Timescale Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic [142.1492359556374]
Bilevel optimization is a class of problems which exhibit a two-level structure. We propose a two-timescale approximation (TTSA) algorithm for tackling such a bilevel problem. We show that a two-timescale natural actor-critic policy optimization algorithm can be viewed as a special case of our TTSA framework.
arXiv Detail & Related papers (2020-07-10T05:20:02Z)
Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point. We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z)
A Finite Time Analysis of Two Time-Scale Actor Critic Methods [87.69128666220016]
We provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point. This is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
arXiv Detail & Related papers (2020-05-04T09:45:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.