Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear
Speedup
- URL: http://arxiv.org/abs/2012.15511v1
- Date: Thu, 31 Dec 2020 09:07:09 GMT
- Title: Asynchronous Advantage Actor Critic: Non-asymptotic Analysis and Linear
Speedup
- Authors: Han Shen, Kaiqing Zhang, Mingyi Hong, Tianyi Chen
- Abstract summary: This paper revisits the A3C algorithm with TD(0) for the critic update, termed A3C-TD(0), with provable convergence guarantees.
Under i.i.d. sampling, A3C-TD(0) obtains sample complexity of $mathcalO(epsilon-2.5/N)$ per worker to achieve $epsilon$ accuracy, where $N$ is the number of workers.
Compared to the best-known sample complexity of $mathcalO(epsilon-2.5/N)$ for two
- Score: 56.27526702716774
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Asynchronous and parallel implementation of standard reinforcement learning
(RL) algorithms is a key enabler of the tremendous success of modern RL. Among
many asynchronous RL algorithms, arguably the most popular and effective one is
the asynchronous advantage actor-critic (A3C) algorithm. Although A3C is
becoming the workhorse of RL, its theoretical properties are still not
well-understood, including the non-asymptotic analysis and the performance gain
of parallelism (a.k.a. speedup). This paper revisits the A3C algorithm with
TD(0) for the critic update, termed A3C-TD(0), with provable convergence
guarantees. With linear value function approximation for the TD update, the
convergence of A3C-TD(0) is established under both i.i.d. and Markovian
sampling. Under i.i.d. sampling, A3C-TD(0) obtains sample complexity of
$\mathcal{O}(\epsilon^{-2.5}/N)$ per worker to achieve $\epsilon$ accuracy,
where $N$ is the number of workers. Compared to the best-known sample
complexity of $\mathcal{O}(\epsilon^{-2.5})$ for two-timescale AC, A3C-TD(0)
achieves \emph{linear speedup}, which justifies the advantage of parallelism
and asynchrony in AC algorithms theoretically for the first time. Numerical
tests on synthetically generated instances and OpenAI Gym environments have
been provided to verify our theoretical analysis.
Related papers
- Stochastic Optimization for Non-convex Problem with Inexact Hessian
Matrix, Gradient, and Function [99.31457740916815]
Trust-region (TR) and adaptive regularization using cubics have proven to have some very appealing theoretical properties.
We show that TR and ARC methods can simultaneously provide inexact computations of the Hessian, gradient, and function values.
arXiv Detail & Related papers (2023-10-18T10:29:58Z) - Convergence Analysis of Decentralized ASGD [1.8710230264817358]
We present a novel convergence-rate analysis for decentralized asynchronous SGD (DASGD) which does not require partial synchronization among nodes nor restrictive network topologies.
Our convergence proof holds for a fixed stepsize and any nonsmooth, homogeneous, L-shaped objective function.
arXiv Detail & Related papers (2023-09-07T14:50:31Z) - Towards Understanding the Generalizability of Delayed Stochastic
Gradient Descent [63.43247232708004]
A gradient descent performed in an asynchronous manner plays a crucial role in training large-scale machine learning models.
Existing generalization error bounds are rather pessimistic and cannot reveal the correlation between asynchronous delays and generalization.
Our theoretical results indicate that asynchronous delays reduce the generalization error of the delayed SGD algorithm.
arXiv Detail & Related papers (2023-08-18T10:00:27Z) - SYNTHESIS: A Semi-Asynchronous Path-Integrated Stochastic Gradient
Method for Distributed Learning in Computing Clusters [7.968142741470549]
ulstochastic gradulient ulsearch is developed to overcome the limitations of both synchronous and asynchronous distributed learning algorithms.
algname algorithms have (O(sqrtNepsilon-2(Delta+1) d+N)) and (O(sqrtNepsilon-2(+1) d+N))
(epsilon)-stationary point in non-Delta learning under distributed and shared memory architectures
arXiv Detail & Related papers (2022-08-17T17:42:33Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - Finite-Time Analysis of Fully Decentralized Single-Timescale
Actor-Critic [4.94128206910124]
We introduce a fully decentralized Actor-Critic (AC) algorithm, where actor, critic, and global reward estimator are updated in an alternating manner.
We show that our algorithm has sample complexity of $tildemathcalO(epsilon-2)$ under Markovian sampling.
We also provide a local action privacy-preserving version of our algorithm and its analysis.
arXiv Detail & Related papers (2022-06-12T13:14:14Z) - Non-asymptotic Convergence Analysis of Two Time-scale (Natural)
Actor-Critic Algorithms [58.57004511121862]
Actor-critic (AC) and natural actor-critic (NAC) algorithms are often executed in two ways for finding optimal policies.
We show that two time-scale AC requires the overall sample complexity at the order of $mathcalO(epsilon-2.5log3(epsilon-1))$ to attain an $epsilon$-accurate stationary point.
We develop novel techniques for bounding the bias error of the actor due to dynamically changing Markovian sampling.
arXiv Detail & Related papers (2020-05-07T15:42:31Z) - Accelerating Feedforward Computation via Parallel Nonlinear Equation
Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning.
We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both.
Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.