Finite time analysis of temporal difference learning with linear
function approximation: Tail averaging and regularisation
- URL: http://arxiv.org/abs/2210.05918v2
- Date: Mon, 11 Sep 2023 04:22:30 GMT
- Title: Finite time analysis of temporal difference learning with linear
function approximation: Tail averaging and regularisation
- Authors: Gandharv Patil, Prashanth L.A., Dheeraj Nagaraj, Doina Precup
- Abstract summary: We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging.
We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice.
- Score: 44.27439128304058
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We study the finite-time behaviour of the popular temporal difference (TD)
learning algorithm when combined with tail-averaging. We derive finite time
bounds on the parameter error of the tail-averaged TD iterate under a step-size
choice that does not require information about the eigenvalues of the matrix
underlying the projected TD fixed point. Our analysis shows that tail-averaged
TD converges at the optimal $O\left(1/t\right)$ rate, both in expectation and
with high probability. In addition, our bounds exhibit a sharper rate of decay
for the initial error (bias), which is an improvement over averaging all
iterates. We also propose and analyse a variant of TD that incorporates
regularisation. From analysis, we conclude that the regularised version of TD
is useful for problems with ill-conditioned features.
Related papers
- Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP [1.0923877073891446]
We consider the problem of policy evaluation for variance in a discounted reward Markov decision process.
For this problem, a temporal difference (TD) type learning algorithm with linear function approximation (LFA) exists in the literature.
We derive finite sample bounds that hold (i) in the mean-squared sense; and (ii) with high probability, when tail iterate averaging is employed.
arXiv Detail & Related papers (2024-06-12T05:49:53Z) - Distributed TD(0) with Almost No Communication [15.321579527891457]
We provide a new non-asymptotic analysis of distributed temporal difference learning with linear function approximation.
We demonstrate a version of the linear time speedup phenomenon, where the convergence time of the distributed process is a factor of $N$ faster than the convergence time of TD(0).
arXiv Detail & Related papers (2023-05-25T17:00:46Z) - Non-stationary Online Convex Optimization with Arbitrary Delays [50.46856739179311]
This paper investigates the delayed online convex optimization (OCO) in non-stationary environments.
We first propose a simple algorithm, namely DOGD, which performs a gradient descent step for each delayed gradient according to their arrival order.
We develop an improved algorithm, which reduces those dynamic regret bounds achieved by DOGD to $O(sqrtbardT(P_T+1))$.
arXiv Detail & Related papers (2023-05-20T07:54:07Z) - PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning
Method [49.93717224277131]
We propose a new ETD method, called PER-ETD (i.e., PEriodically Restarted-ETD), which restarts and updates the follow-on trace only for a finite period.
We show that PER-ETD converges to the same desirable fixed point as ETD, but improves the exponential sample complexity to bes.
arXiv Detail & Related papers (2021-10-13T17:40:12Z) - High-probability Bounds for Non-Convex Stochastic Optimization with
Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails.
We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z) - Last iterate convergence of SGD for Least-Squares in the Interpolation
regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup.
We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z) - Simple and optimal methods for stochastic variational inequalities, II:
Markovian noise and policy evaluation in reinforcement learning [9.359939442911127]
This paper focuses on resetting variational inequalities (VI) under Markovian noise.
A prominent application of our algorithmic developments is the policy evaluation problem in reinforcement learning.
arXiv Detail & Related papers (2020-11-15T04:05:22Z) - ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms.
We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients.
We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z) - Bounding the expected run-time of nonconvex optimization with early
stopping [2.7648976108201815]
This work examines the convergence of gradient-based optimization algorithms that use early stopping based on a validation function.
We derive conditions that guarantee this stopping rule is well-defined, and provide bounds on the expected number of iterations and gradient evaluations needed to meet this criterion.
arXiv Detail & Related papers (2020-02-20T16:43:37Z) - Reanalysis of Variance Reduced Temporal Difference Learning [57.150444843282]
A variance reduced TD (VRTD) algorithm was proposed by Korda and La, which applies the variance reduction technique directly to the online TD learning with Markovian samples.
We show that VRTD is guaranteed to converge to a neighborhood of the fixed-point solution of TD at a linear convergence rate.
arXiv Detail & Related papers (2020-01-07T05:32:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.