Distributed TD(0) with Almost No Communication
- URL: http://arxiv.org/abs/2305.16246v1
- Date: Thu, 25 May 2023 17:00:46 GMT
- Title: Distributed TD(0) with Almost No Communication
- Authors: Rui Liu, Alex Olshevsky
- Abstract summary: We provide a new non-asymptotic analysis of distributed temporal difference learning with linear function approximation.
We demonstrate a version of the linear time speedup phenomenon, where the convergence time of the distributed process is a factor of $N$ faster than the convergence time of TD(0).
- Score: 15.321579527891457
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide a new non-asymptotic analysis of distributed temporal difference
learning with linear function approximation. Our approach relies on ``one-shot
averaging,'' where $N$ agents run identical local copies of the TD(0) method
and average the outcomes only once at the very end. We demonstrate a version of
the linear time speedup phenomenon, where the convergence time of the
distributed process is a factor of $N$ faster than the convergence time of
TD(0). This is the first result proving benefits from parallelism for temporal
difference methods.
Related papers
- Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP [1.0923877073891446]
We consider the problem of policy evaluation for variance in a discounted reward Markov decision process.
For this problem, a temporal difference (TD) type learning algorithm with linear function approximation (LFA) exists in the literature.
We derive finite sample bounds that hold (i) in the mean-squared sense; and (ii) with high probability, when tail iterate averaging is employed.
arXiv Detail & Related papers (2024-06-12T05:49:53Z) - DASA: Delay-Adaptive Multi-Agent Stochastic Approximation [64.32538247395627]
We consider a setting in which $N$ agents aim to speedup a common Approximation problem by acting in parallel and communicating with a central server.
To mitigate the effect of delays and stragglers, we propose textttDASA, a Delay-Adaptive algorithm for multi-agent Approximation.
arXiv Detail & Related papers (2024-03-25T22:49:56Z) - Online Learning with Adversaries: A Differential-Inclusion Analysis [52.43460995467893]
We introduce an observation-matrix-based framework for fully asynchronous online Federated Learning with adversaries.
Our main result is that the proposed algorithm almost surely converges to the desired mean $mu.$
We derive this convergence using a novel differential-inclusion-based two-timescale analysis.
arXiv Detail & Related papers (2023-04-04T04:32:29Z) - Explicit Second-Order Min-Max Optimization Methods with Optimal Convergence Guarantee [86.05440220344755]
We propose and analyze inexact regularized Newton-type methods for finding a global saddle point of emphcon unconstrained min-max optimization problems.
We show that the proposed methods generate iterates that remain within a bounded set and that the iterations converge to an $epsilon$-saddle point within $O(epsilon-2/3)$ in terms of a restricted function.
arXiv Detail & Related papers (2022-10-23T21:24:37Z) - Finite time analysis of temporal difference learning with linear
function approximation: Tail averaging and regularisation [44.27439128304058]
We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging.
We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice.
arXiv Detail & Related papers (2022-10-12T04:37:54Z) - Semi-supervised Learning of Partial Differential Operators and Dynamical
Flows [68.77595310155365]
We present a novel method that combines a hyper-network solver with a Fourier Neural Operator architecture.
We test our method on various time evolution PDEs, including nonlinear fluid flows in one, two, and three spatial dimensions.
The results show that the new method improves the learning accuracy at the time point of supervision point, and is able to interpolate and the solutions to any intermediate time.
arXiv Detail & Related papers (2022-07-28T19:59:14Z) - Sharper Convergence Guarantees for Asynchronous SGD for Distributed and
Federated Learning [77.22019100456595]
We show a training algorithm for distributed computation workers with varying communication frequency.
In this work, we obtain a tighter convergence rate of $mathcalO!!!(sigma2-2_avg!! .
We also show that the heterogeneity term in rate is affected by the average delay within each worker.
arXiv Detail & Related papers (2022-06-16T17:10:57Z) - PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning
Method [49.93717224277131]
We propose a new ETD method, called PER-ETD (i.e., PEriodically Restarted-ETD), which restarts and updates the follow-on trace only for a finite period.
We show that PER-ETD converges to the same desirable fixed point as ETD, but improves the exponential sample complexity to bes.
arXiv Detail & Related papers (2021-10-13T17:40:12Z) - Distributed TD(0) with Almost No Communication [13.578454059496847]
We provide a new non-asymptotic analysis of distributed TD(0) with linear function approximation.
Our approach relies on "one-shot averaging," where $N$ agents run local copies of TD(0) and average the outcomes only once at the very end.
arXiv Detail & Related papers (2021-04-16T02:21:11Z) - Temporal Difference Learning as Gradient Splitting [15.321579527891457]
We show that convergence proofs for gradient descent can be applied almost verbatim to temporal difference learning.
We show that a minor variation on TD learning which estimates the mean of the value function separately has a convergence time where $1/(1-gamma)$ only multiplies anally negligible term.
arXiv Detail & Related papers (2020-10-27T22:50:39Z) - Adaptive Temporal Difference Learning with Linear Function Approximation [29.741034258674205]
This paper revisits the temporal difference (TD) learning algorithm for the policy evaluation tasks in reinforcement learning.
We develop a provably convergent adaptive projected variant of the TD(0) learning algorithm with linear function approximation.
We evaluate the performance of AdaTD(0) and AdaTD($lambda$) on several standard reinforcement learning tasks.
arXiv Detail & Related papers (2020-02-20T02:32:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.