Related papers: Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation

URL: http://arxiv.org/abs/2502.14172v1
Date: Thu, 20 Feb 2025 00:53:22 GMT
Title: Finite Sample Analysis of Distributional TD Learning with Linear Function Approximation
Authors: Yang Peng, Kaicheng Jin, Liangyu Zhang, Zhihua Zhang,
Abstract summary: We show that the sample complexity of linear distributional TD learning matches that of the classic linear TD learning.<n>Our findings provide new insights into the statistical efficiency of distributional reinforcement learning algorithms.
Score: 21.999445060856278
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we investigate the finite-sample statistical rates of distributional temporal difference (TD) learning with linear function approximation. The aim of distributional TD learning is to estimate the return distribution of a discounted Markov decision process for a given policy {\pi}. Prior works on statistical analysis of distributional TD learning mainly focus on the tabular case. In contrast, we first consider the linear function approximation setting and derive sharp finite-sample rates. Our theoretical results demonstrate that the sample complexity of linear distributional TD learning matches that of the classic linear TD learning. This implies that, with linear function approximation, learning the full distribution of the return using streaming data is no more difficult than learning its expectation (i.e. the value function). To derive tight sample complexity bounds, we conduct a fine-grained analysis of the linear-categorical Bellman equation, and employ the exponential stability arguments for products of random matrices. Our findings provide new insights into the statistical efficiency of distributional reinforcement learning algorithms.

Related papers

Bayesian Kernel Regression for Functional Data [1.4501446815590895]
In supervised learning, the output variable to be predicted is often represented as a function.<n>We propose a novel functional output regression model based on kernel methods.
arXiv Detail & Related papers (2025-03-17T19:28:27Z)
Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains.<n>We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z)
TD(0) Learning converges for Polynomial mixing and non-linear functions [49.1574468325115]
We present theoretical findings for TD learning under more applicable assumptions.<n>This is the first proof of TD(0) convergence on Markov data under universal and-independent step sizes.<n>Our results include bounds for linear models and non-linear under generalized gradients and H"older continuity.
arXiv Detail & Related papers (2025-02-08T22:01:02Z)
Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
We study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation.<n>First, we derive a novel high-dimensional probability convergence guarantee that depends explicitly on the variance and holds under weak conditions.<n>We further establish refined high-dimensional Berry-Esseen bounds over the class of convex sets that guarantee faster rates than those in the literature.
arXiv Detail & Related papers (2024-10-21T15:34:44Z)
An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models [20.314426291330278]
In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.) This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MRP) for data modeling. We reformulate the typical supervised learning as an on-policy policy evaluation problem within reinforcement learning (RL), introducing a generalized temporal difference (TD) learning algorithm as a resolution.
arXiv Detail & Related papers (2024-04-23T21:02:58Z)
Sampling Multimodal Distributions with the Vanilla Score: Benefits of Data-Based Initialization [19.19974210314107]
Hyv"arinen proposed vanilla score matching as a way to learn distributions from data. We prove that the Langevin diffusion with early stopping, at the empirical distribution, and run on a score function estimated from data successfully generates natural multimodal distributions.
arXiv Detail & Related papers (2023-10-03T03:06:59Z)
Nonparametric Linear Feature Learning in Regression Through Regularisation [0.0]
We propose a novel method for joint linear feature learning and non-parametric function estimation. By using alternative minimisation, we iteratively rotate the data to improve alignment with leading directions. We establish that the expected risk of our method converges to the minimal risk under minimal assumptions and with explicit rates.
arXiv Detail & Related papers (2023-07-24T12:52:55Z)
Offline Reinforcement Learning with Differentiable Function Approximation is Provably Efficient [65.08966446962845]
offline reinforcement learning, which aims at optimizing decision-making strategies with historical data, has been extensively applied in real-life applications. We take a step by considering offline reinforcement learning with differentiable function class approximation (DFA) Most importantly, we show offline differentiable function approximation is provably efficient by analyzing the pessimistic fitted Q-learning algorithm.
arXiv Detail & Related papers (2022-10-03T07:59:42Z)
Near-optimal Offline Reinforcement Learning with Linear Representation: Leveraging Variance Information with Pessimism [65.46524775457928]
offline reinforcement learning seeks to utilize offline/historical data to optimize sequential decision-making strategies. We study the statistical limits of offline reinforcement learning with linear model representations.
arXiv Detail & Related papers (2022-03-11T09:00:12Z)
Distributionally Robust Learning [11.916893752969429]
This book develops a comprehensive statistical learning framework that is robust to (distributional) perturbations in the data. A tractable DRO relaxation for each problem is being derived, establishing a connection between bounds and regularization. Beyond theory, we include numerical experiments and case studies using synthetic and real data.
arXiv Detail & Related papers (2021-08-20T04:14:18Z)
Distributional Reinforcement Learning via Moment Matching [54.16108052278444]
We formulate a method that learns a finite set of statistics from each return distribution via neural networks. Our method can be interpreted as implicitly matching all orders of moments between a return distribution and its Bellman target. Experiments on the suite of Atari games show that our method outperforms the standard distributional RL baselines.
arXiv Detail & Related papers (2020-07-24T05:18:17Z)
A maximum-entropy approach to off-policy evaluation in average-reward MDPs [54.967872716145656]
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs) We provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases. We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
arXiv Detail & Related papers (2020-06-17T18:13:37Z)
A Distributional Analysis of Sampling-Based Reinforcement Learning Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes. We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z)

This list is automatically generated from the titles and abstracts of the papers in this site.