Related papers: A Small Gain Analysis of Single Timescale Actor Critic

A Small Gain Analysis of Single Timescale Actor Critic

URL: http://arxiv.org/abs/2203.02591v4
Date: Thu, 25 May 2023 17:59:20 GMT
Title: A Small Gain Analysis of Single Timescale Actor Critic
Authors: Alex Olshevsky, Bahman Gharesifard
Abstract summary: We consider a version of actor-critic which uses proportional step-sizes and only one critic update with a single sample from the stationary distribution per actor step. We prove that this method can be used to find a stationary point, and that the resulting sample complexity improves the state of the art for actor-critic methods.
Score: 16.092248433189816
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider a version of actor-critic which uses proportional step-sizes and only one critic update with a single sample from the stationary distribution per actor step. We provide an analysis of this method using the small-gain theorem. Specifically, we prove that this method can be used to find a stationary point, and that the resulting sample complexity improves the state of the art for actor-critic methods to $O \left(\mu^{-2} \epsilon^{-2} \right)$ to find an $\epsilon$-approximate stationary point where $\mu$ is the condition number associated with the critic.

Related papers

Locally minimax optimal and dimension-agnostic discrete argmin inference [33.17951971728784]
This paper tackles a fundamental inference problem: given $n$ observations from a $d$ dimensional vector with unknown mean $boldsymbolmu$, we must form a confidence set for the index corresponding to the smallest component of $boldsymbolmu$.<n>By duality, we reduce this to testing, for each $r$ in $1,ldots,d$, whether $mu_r$ is the smallest.<n>We propose "dimension-agnostic" tests that maintain validity regardless of how $d$ scales with $n$, and regardless of arbitrary ties in $bold
arXiv Detail & Related papers (2025-03-27T16:06:07Z)
Towards a Sharp Analysis of Offline Policy Learning for $f$-Divergence-Regularized Contextual Bandits [49.96531901205305]
We analyze $f$-divergence-regularized offline policy learning.<n>For reverse Kullback-Leibler (KL) divergence, we give the first $tildeO(epsilon-1)$ sample complexity under single-policy concentrability.<n>We extend our analysis to dueling bandits, and we believe these results take a significant step toward a comprehensive understanding of $f$-divergence-regularized policy learning.
arXiv Detail & Related papers (2025-02-09T22:14:45Z)
Improved Sample Complexity for Global Convergence of Actor-Critic Algorithms [49.19842488693726]
We establish the global convergence of the actor-critic algorithm with a significantly improved sample complexity of $O(epsilon-3)$. Our findings provide theoretical support for many algorithms that rely on constant step sizes.
arXiv Detail & Related papers (2024-10-11T14:46:29Z)
Non-Asymptotic Analysis for Single-Loop (Natural) Actor-Critic with Compatible Function Approximation [18.77565744533582]
Actor-critic (AC) is a powerful method for learning an optimal policy in reinforcement learning. AC converges to an $epsilon+varepsilon_textcritic$ neighborhood of stationary points with the best known sample complexity. This paper analyzes the convergence of both AC and NAC algorithms with compatible function approximation.
arXiv Detail & Related papers (2024-06-03T20:05:04Z)
GROS: A General Robust Aggregation Strategy [49.1574468325115]
A new, very general, robust procedure for combining estimators in metric spaces is introduced. We show that the same (up to a constant) sub-Gaussianity is obtained if the minimization is taken over the sample. The performance of GROS is evaluated through five simulation studies.
arXiv Detail & Related papers (2024-02-23T17:00:32Z)
Two-Timescale Critic-Actor for Average Reward MDPs with Function Approximation [5.945710235932345]
We present the first two-timescale critic-actor algorithm with function approximation in the long-run average reward setting. A notable feature of our analysis is that unlike recent single-timescale actor-critic algorithms, we present a complete convergence analysis of our scheme.
arXiv Detail & Related papers (2024-02-02T12:48:49Z)
Finite-time analysis of single-timescale actor-critic [8.994243376183658]
Actor-critic methods have achieved significant success in many challenging applications. finite-time convergence is still poorly understood in the most practical single-timescale form. We investigate the more practical online single-timescale actor-critic algorithm on continuous state space.
arXiv Detail & Related papers (2022-10-18T15:03:56Z)
Sharper Rates and Flexible Framework for Nonconvex SGD with Client and Data Sampling [64.31011847952006]
We revisit the problem of finding an approximately stationary point of the average of $n$ smooth and possibly non-color functions. We generalize the $smallsfcolorgreen$ so that it can provably work with virtually any sampling mechanism. We provide the most general and most accurate analysis of optimal bound in the smooth non-color regime.
arXiv Detail & Related papers (2022-06-05T21:32:33Z)
A Deeper Look at Discounting Mismatch in Actor-Critic Algorithms [81.01917016753644]
We investigate the discounting mismatch in actor-critic algorithm implementations from a representation learning perspective. Theoretically, actor-critic algorithms usually have discounting for both actor and critic, i.e., there is a $gammat$ term in the actor update for the transition observed at time $t$ in a trajectory. Practitioners, however, usually ignore the discounting ($gammat$) for the actor while using a discounted critic.
arXiv Detail & Related papers (2020-10-02T15:51:48Z)
Multi-label Contrastive Predictive Coding [125.03510235962095]
Variational mutual information (MI) estimators are widely used in unsupervised representation learning methods such as contrastive predictive coding (CPC) We introduce a novel estimator based on a multi-label classification problem, where the critic needs to jointly identify multiple positive samples at the same time. We show that using the same amount of negative samples, multi-label CPC is able to exceed the $log m$ bound, while still being a valid lower bound of mutual information.
arXiv Detail & Related papers (2020-07-20T02:46:21Z)
Non-asymptotic Convergence Analysis of Two Time-scale (Natural) Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies. We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point. We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z)
A Finite Time Analysis of Two Time-Scale Actor Critic Methods [87.69128666220016]
We provide a non-asymptotic analysis for two time-scale actor-critic methods under non-i.i.d. setting. We prove that the actor-critic method is guaranteed to find a first-order stationary point. This is the first work providing finite-time analysis and sample complexity bound for two time-scale actor-critic methods.
arXiv Detail & Related papers (2020-05-04T09:45:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.