Related papers: Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation

URL: http://arxiv.org/abs/2210.05918v2
Date: Mon, 11 Sep 2023 04:22:30 GMT
Title: Finite time analysis of temporal difference learning with linear function approximation: Tail averaging and regularisation
Authors: Gandharv Patil, Prashanth L.A., Dheeraj Nagaraj, Doina Precup
Abstract summary: We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice.
Score: 44.27439128304058
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging. We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice that does not require information about the eigenvalues of the matrix underlying the projected TD fixed point. Our analysis shows that tail-averaged TD converges at the optimal $O\left(1/t\right)$ rate, both in expectation and with high probability. In addition, our bounds exhibit a sharper rate of decay for the initial error (bias), which is an improvement over averaging all iterates. We also propose and analyse a variant of TD that incorporates regularisation. From analysis, we conclude that the regularised version of TD is useful for problems with ill-conditioned features.

Related papers

Two-Timescale Linear Stochastic Approximation: Constant Stepsizes Go a Long Way [12.331596909999764]
We investigate it constant stpesize schemes through the lens of Markov processes. We derive explicit geometric and non-asymptotic convergence rates, as well as the variance and bias introduced by constant stepsizes.
arXiv Detail & Related papers (2024-10-16T21:49:27Z)
Finite Time Analysis of Temporal Difference Learning for Mean-Variance in a Discounted MDP [1.0923877073891446]
We consider the problem of policy evaluation for variance in a discounted reward Markov decision process. For this problem, a temporal difference (TD) type learning algorithm with linear function approximation (LFA) exists in the literature. We derive finite sample bounds that hold (i) in the mean-squared sense; and (ii) with high probability, when tail iterate averaging is employed.
arXiv Detail & Related papers (2024-06-12T05:49:53Z)
Distributed TD(0) with Almost No Communication [15.321579527891457]
We provide a new non-asymptotic analysis of distributed temporal difference learning with linear function approximation. We demonstrate a version of the linear time speedup phenomenon, where the convergence time of the distributed process is a factor of $N$ faster than the convergence time of TD(0).
arXiv Detail & Related papers (2023-05-25T17:00:46Z)
Non-stationary Online Convex Optimization with Arbitrary Delays [50.46856739179311]
This paper investigates the delayed online convex optimization (OCO) in non-stationary environments. We first propose a simple algorithm, namely DOGD, which performs a gradient descent step for each delayed gradient according to their arrival order. We develop an improved algorithm, which reduces those dynamic regret bounds achieved by DOGD to $O(sqrtbardT(P_T+1))$.
arXiv Detail & Related papers (2023-05-20T07:54:07Z)
PER-ETD: A Polynomially Efficient Emphatic Temporal Difference Learning Method [49.93717224277131]
We propose a new ETD method, called PER-ETD (i.e., PEriodically Restarted-ETD), which restarts and updates the follow-on trace only for a finite period. We show that PER-ETD converges to the same desirable fixed point as ETD, but improves the exponential sample complexity to bes.
arXiv Detail & Related papers (2021-10-13T17:40:12Z)
High-probability Bounds for Non-Convex Stochastic Optimization with Heavy Tails [55.561406656549686]
We consider non- Hilbert optimization using first-order algorithms for which the gradient estimates may have tails. We show that a combination of gradient, momentum, and normalized gradient descent convergence to critical points in high-probability with best-known iteration for smooth losses.
arXiv Detail & Related papers (2021-06-28T00:17:01Z)
Last iterate convergence of SGD for Least-Squares in the Interpolation regime [19.05750582096579]
We study the noiseless model in the fundamental least-squares setup. We assume that an optimum predictor fits perfectly inputs and outputs $langle theta_*, phi(X) rangle = Y$, where $phi(X)$ stands for a possibly infinite dimensional non-linear feature map.
arXiv Detail & Related papers (2021-02-05T14:02:20Z)
Simple and optimal methods for stochastic variational inequalities, II: Markovian noise and policy evaluation in reinforcement learning [9.359939442911127]
This paper focuses on resetting variational inequalities (VI) under Markovian noise. A prominent application of our algorithmic developments is the policy evaluation problem in reinforcement learning.
arXiv Detail & Related papers (2020-11-15T04:05:22Z)
ROOT-SGD: Sharp Nonasymptotics and Near-Optimal Asymptotics in a Single Algorithm [71.13558000599839]
We study the problem of solving strongly convex and smooth unconstrained optimization problems using first-order algorithms. We devise a novel, referred to as Recursive One-Over-T SGD, based on an easily implementable, averaging of past gradients. We prove that it simultaneously achieves state-of-the-art performance in both a finite-sample, nonasymptotic sense and an sense.
arXiv Detail & Related papers (2020-08-28T14:46:56Z)
Bounding the expected run-time of nonconvex optimization with early stopping [2.7648976108201815]
This work examines the convergence of gradient-based optimization algorithms that use early stopping based on a validation function. We derive conditions that guarantee this stopping rule is well-defined, and provide bounds on the expected number of iterations and gradient evaluations needed to meet this criterion.
arXiv Detail & Related papers (2020-02-20T16:43:37Z)
Reanalysis of Variance Reduced Temporal Difference Learning [57.150444843282]
A variance reduced TD (VRTD) algorithm was proposed by Korda and La, which applies the variance reduction technique directly to the online TD learning with Markovian samples. We show that VRTD is guaranteed to converge to a neighborhood of the fixed-point solution of TD at a linear convergence rate.
arXiv Detail & Related papers (2020-01-07T05:32:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.