A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections nor Strong Convexity
- URL: http://arxiv.org/abs/2506.01052v1
- Date: Sun, 01 Jun 2025 15:39:00 GMT
- Title: A Finite-Time Analysis of TD Learning with Linear Function Approximation without Projections nor Strong Convexity
- Authors: Wei-Cheng Lee, Francesco Orabona,
- Abstract summary: We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation.<n>We show that the simple projection-free variant converges with a rate of $tildemath||theta*||2sqrtT$, even in the presence of Markovian noise.
- Score: 11.117572650083698
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We investigate the finite-time convergence properties of Temporal Difference (TD) learning with linear function approximation, a cornerstone algorithm in reinforcement learning. While prior work has established convergence guarantees, these results typically rely on the assumption that each iterate is projected onto a bounded set or that the learning rate is set according to the unknown strong convexity constant -- conditions that are both artificial and do not match the current practice. In this paper, we challenge the necessity of such assumptions and present a refined analysis of TD learning. We show that the simple projection-free variant converges with a rate of $\tilde{\mathcal{O}}(\frac{||\theta^*||^2_2}{\sqrt{T}})$, even in the presence of Markovian noise. Our analysis reveals a novel self-bounding property of the TD updates and exploits it to guarantee bounded iterates.
Related papers
- Stochastic Approximation with Unbounded Markovian Noise: A General-Purpose Theorem [7.443139252028032]
We consider average-reward Reinforcement Learning with unbounded state space and reward function.
Recent works studied this problem in the actor-critic framework.
We study Temporal Difference (TD) learning with linear function approximation.
arXiv Detail & Related papers (2024-10-29T03:40:53Z) - Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
We investigate the statistical properties of Temporal Difference learning with Polyak-Ruppert averaging.<n>We make three significant contributions that improve the current state-of-the-art results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - A Simple Finite-Time Analysis of TD Learning with Linear Function Approximation [2.44755919161855]
We study the finite-time convergence of TD learning with linear function approximation under Markovian sampling.
We show that it is possible to retain the simplicity of a projection-based analysis without actually performing a projection step in the algorithm.
arXiv Detail & Related papers (2024-03-04T20:40:02Z) - Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic
Gradient Descent [30.84181129503133]
The last decade has witnessed an increasing number of stability for different algorithms applied on different loss functions.
We make a novel connection between theory applied and introduce a unified guideline for proving stability for optimization algorithms.
Our approach is flexible and can be readilyizable to other popular learning classes.
arXiv Detail & Related papers (2023-05-20T01:49:58Z) - Online Learning with Adversaries: A Differential-Inclusion Analysis [52.43460995467893]
We introduce an observation-matrix-based framework for fully asynchronous online Federated Learning with adversaries.
Our main result is that the proposed algorithm almost surely converges to the desired mean $mu.$
We derive this convergence using a novel differential-inclusion-based two-timescale analysis.
arXiv Detail & Related papers (2023-04-04T04:32:29Z) - An Analysis of Quantile Temporal-Difference Learning [53.36758478669685]
quantile temporal-difference learning (QTD) has proven to be a key component in several successful large-scale applications of reinforcement learning.
Unlike classical TD learning, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points.
This paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1.
arXiv Detail & Related papers (2023-01-11T13:41:56Z) - Finite time analysis of temporal difference learning with linear
function approximation: Tail averaging and regularisation [44.27439128304058]
We study the finite-time behaviour of the popular temporal difference (TD) learning algorithm when combined with tail-averaging.
We derive finite time bounds on the parameter error of the tail-averaged TD iterate under a step-size choice.
arXiv Detail & Related papers (2022-10-12T04:37:54Z) - Contrastive learning of strong-mixing continuous-time stochastic
processes [53.82893653745542]
Contrastive learning is a family of self-supervised methods where a model is trained to solve a classification task constructed from unlabeled data.
We show that a properly constructed contrastive learning task can be used to estimate the transition kernel for small-to-mid-range intervals in the diffusion case.
arXiv Detail & Related papers (2021-03-03T23:06:47Z) - Simple and optimal methods for stochastic variational inequalities, II:
Markovian noise and policy evaluation in reinforcement learning [9.359939442911127]
This paper focuses on resetting variational inequalities (VI) under Markovian noise.
A prominent application of our algorithmic developments is the policy evaluation problem in reinforcement learning.
arXiv Detail & Related papers (2020-11-15T04:05:22Z) - Learning Implicitly with Noisy Data in Linear Arithmetic [94.66549436482306]
We extend implicit learning in PAC-Semantics to handle intervals and threshold uncertainty in the language of linear arithmetic.
We show that our implicit approach to learning optimal linear programming objective constraints significantly outperforms an explicit approach in practice.
arXiv Detail & Related papers (2020-10-23T19:08:46Z) - Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning.
We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z) - Stochastic Approximation with Markov Noise: Analysis and applications in
reinforcement learning [0.0]
We present for the first time an convergence analysis of two time-scale approximation driven by "controlled" Markov noise.
We analyze the behavior of our framework by relating it to limiting differential inclusions in both time scales.
We obtain the first informative error bounds on function approximation for the policy evaluation algorithm.
arXiv Detail & Related papers (2020-04-08T03:59:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.