Related papers: Finite Time Analysis of Constrained Natural Critic-Actor Algorithm with Improved Sample Complexity

Finite Time Analysis of Constrained Natural Critic-Actor Algorithm with Improved Sample Complexity

URL: http://arxiv.org/abs/2510.04189v1
Date: Sun, 05 Oct 2025 13:02:38 GMT
Title: Finite Time Analysis of Constrained Natural Critic-Actor Algorithm with Improved Sample Complexity
Authors: Prashansa Panda, Shalabh Bhatnagar,
Abstract summary: We introduce the first natural critic-actor algorithm with function for the long-run average cost setting.<n>Our analysis establishes optimal learning rates and we also propose a modification to enhance sample complexity.
Score: 6.304715653196449
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent studies have increasingly focused on non-asymptotic convergence analyses for actor-critic (AC) algorithms. One such effort introduced a two-timescale critic-actor algorithm for the discounted cost setting using a tabular representation, where the usual roles of the actor and critic are reversed. However, only asymptotic convergence was established there. Subsequently, both asymptotic and non-asymptotic analyses of the critic-actor algorithm with linear function approximation were conducted. In our work, we introduce the first natural critic-actor algorithm with function approximation for the long-run average cost setting and under inequality constraints. We provide the non-asymptotic convergence guarantees for this algorithm. Our analysis establishes optimal learning rates and we also propose a modification to enhance sample complexity. We further show the results of experiments on three different Safety-Gym environments where our algorithm is found to be competitive in comparison with other well known algorithms.

Related papers

Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation [6.304715653196449]
We present the first two-timescale critic-actor algorithm with function approximation in the long-run average reward setting.<n>We also present the first finite-time non-asymptotic algorithm as well as convergence analysis for such a scheme.
arXiv Detail & Related papers (2024-02-02T12:48:49Z)
Finite-Time Analysis of Three-Timescale Constrained Actor-Critic and Constrained Natural Actor-Critic Algorithms [6.304715653196449]
We consider actor critic and natural actor critic algorithms with function approximation for constrained Markov decision processes.<n>We carry out a non-asymptotic analysis for both of these algorithms in a non-i.i.d (Markovian) setting.<n>We also show the results of experiments on three different Safety-Gym environments.
arXiv Detail & Related papers (2023-10-25T05:04:00Z)
Near-Optimal Non-Convex Stochastic Optimization under Generalized Smoothness [21.865728815935665]
Two recent works established the $O(epsilon-3)$ sample complexity to obtain an $O(epsilon)$-stationary point. However, both require a large batch size on the order of $mathrmploy(epsilon-1)$, which is not only computationally burdensome but also unsuitable for streaming applications. In this work, we solve the prior two problems simultaneously by revisiting a simple variant of the STORM algorithm.
arXiv Detail & Related papers (2023-02-13T00:22:28Z)
A Sequential Deep Learning Algorithm for Sampled Mixed-integer Optimisation Problems [0.3867363075280544]
We introduce and analyse two efficient algorithms for mixed-integer optimisation problems. We show that both algorithms exhibit finite-time convergence towards the optimal solution. We establish quantitatively the efficacy of these algorithms by means of three numerical tests.
arXiv Detail & Related papers (2023-01-25T17:10:52Z)
First-Order Algorithms for Nonlinear Generalized Nash Equilibrium Problems [88.58409977434269]
We consider the problem of computing an equilibrium in a class of nonlinear generalized Nash equilibrium problems (NGNEPs) Our contribution is to provide two simple first-order algorithmic frameworks based on the quadratic penalty method and the augmented Lagrangian method. We provide nonasymptotic theoretical guarantees for these algorithms.
arXiv Detail & Related papers (2022-04-07T00:11:05Z)
Amortized Implicit Differentiation for Stochastic Bilevel Optimization [53.12363770169761]
We study a class of algorithms for solving bilevel optimization problems in both deterministic and deterministic settings. We exploit a warm-start strategy to amortize the estimation of the exact gradient. By using this framework, our analysis shows these algorithms to match the computational complexity of methods that have access to an unbiased estimate of the gradient.
arXiv Detail & Related papers (2021-11-29T15:10:09Z)
Asymptotic study of stochastic adaptive algorithm in non-convex landscape [2.1320960069210484]
This paper studies some assumption properties of adaptive algorithms widely used in optimization and machine learning. Among them Adagrad and Rmsprop, which are involved in most of the blackbox deep learning algorithms.
arXiv Detail & Related papers (2020-12-10T12:54:45Z)
An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits [129.1029690825929]
We introduce a novel algorithm improving over the state-of-the-art along multiple dimensions. We establish minimax optimality for any learning horizon in the special case of non-contextual linear bandits.
arXiv Detail & Related papers (2020-10-23T09:12:47Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
Beyond Worst-Case Analysis in Stochastic Approximation: Moment Estimation Improves Instance Complexity [58.70807593332932]
We study oracle complexity of gradient based methods for approximation problems. We focus on instance-dependent complexity instead of worst case complexity. Our proposed algorithm and its analysis provide a theoretical justification for the success of moment estimation.
arXiv Detail & Related papers (2020-06-08T09:25:47Z)
Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point. We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z)
Is Temporal Difference Learning Optimal? An Instance-Dependent Analysis [102.29671176698373]
We address the problem of policy evaluation in discounted decision processes, and provide Markov-dependent guarantees on the $ell_infty$error under a generative model. We establish both and non-asymptotic versions of local minimax lower bounds for policy evaluation, thereby providing an instance-dependent baseline by which to compare algorithms.
arXiv Detail & Related papers (2020-03-16T17:15:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.