Finite-Time Analysis of Temporal Difference Learning: Discrete-Time
Linear System Perspective
- URL: http://arxiv.org/abs/2204.10479v6
- Date: Fri, 2 Jun 2023 07:35:14 GMT
- Title: Finite-Time Analysis of Temporal Difference Learning: Discrete-Time
Linear System Perspective
- Authors: Donghwan Lee and Do Wan Kim
- Abstract summary: TD-learning is a fundamental algorithm in the field of reinforcement learning (RL)
Recent research has uncovered guarantees concerning its statistical efficiency by developing finite-time error bounds.
- Score: 3.5823366350053325
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: TD-learning is a fundamental algorithm in the field of reinforcement learning
(RL), that is employed to evaluate a given policy by estimating the
corresponding value function for a Markov decision process. While significant
progress has been made in the theoretical analysis of TD-learning, recent
research has uncovered guarantees concerning its statistical efficiency by
developing finite-time error bounds. This paper aims to contribute to the
existing body of knowledge by presenting a novel finite-time analysis of
tabular temporal difference (TD) learning, which makes direct and effective use
of discrete-time stochastic linear system models and leverages Schur matrix
properties. The proposed analysis can cover both on-policy and off-policy
settings in a unified manner. By adopting this approach, we hope to offer new
and straightforward templates that not only shed further light on the analysis
of TD-learning and related RL algorithms but also provide valuable insights for
future research in this domain.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models [20.314426291330278]
In traditional statistical learning, data points are usually assumed to be independently and identically distributed (i.i.d.)
This paper presents a contrasting viewpoint, perceiving data points as interconnected and employing a Markov reward process (MRP) for data modeling.
We reformulate the typical supervised learning as an on-policy policy evaluation problem within reinforcement learning (RL), introducing a generalized temporal difference (TD) learning algorithm as a resolution.
arXiv Detail & Related papers (2024-04-23T21:02:58Z) - Analysis of Off-Policy Multi-Step TD-Learning with Linear Function Approximation [5.152147416671501]
This paper analyzes multi-step TD-learning algorithms characterized by linear function approximation, off-policy learning, and bootstrapping.
Two n-step TD-learning algorithms are proposed and analyzed, which can be seen as the model-free reinforcement learning counterparts of the gradient and control theoretic algorithms.
arXiv Detail & Related papers (2024-02-24T10:42:50Z) - Revisiting the Temporal Modeling in Spatio-Temporal Predictive Learning
under A Unified View [73.73667848619343]
We introduce USTEP (Unified S-TEmporal Predictive learning), an innovative framework that reconciles the recurrent-based and recurrent-free methods by integrating both micro-temporal and macro-temporal scales.
arXiv Detail & Related papers (2023-10-09T16:17:42Z) - Uncertainty quantification for learned ISTA [5.706217259840463]
Algorithm unrolling schemes stand out among these model-based learning techniques.
They lack certainty estimates and a theory for uncertainty quantification is still elusive.
This work proposes a rigorous way to obtain confidence intervals for the LISTA estimator.
arXiv Detail & Related papers (2023-09-14T18:39:07Z) - The Statistical Benefits of Quantile Temporal-Difference Learning for
Value Estimation [53.53493178394081]
We analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD)
Even if a practitioner has no interest in the return distribution beyond the mean, QTD may offer performance superior to approaches such as classical TD learning.
arXiv Detail & Related papers (2023-05-28T10:52:46Z) - A Survey on Deep Learning based Time Series Analysis with Frequency
Transformation [74.3919960186696]
Frequency transformation (FT) has been increasingly incorporated into deep learning models to enhance state-of-the-art accuracy and efficiency in time series analysis.
Despite the growing attention and the proliferation of research in this emerging field, there is currently a lack of a systematic review and in-depth analysis of deep learning-based time series models with FT.
We present a comprehensive review that systematically investigates and summarizes the recent research advancements in deep learning-based time series analysis with FT.
arXiv Detail & Related papers (2023-02-04T14:33:07Z) - Latent Properties of Lifelong Learning Systems [59.50307752165016]
We introduce an algorithm-agnostic explainable surrogate-modeling approach to estimate latent properties of lifelong learning algorithms.
We validate the approach for estimating these properties via experiments on synthetic data.
arXiv Detail & Related papers (2022-07-28T20:58:13Z) - Control Theoretic Analysis of Temporal Difference Learning [7.191780076353627]
TD-learning serves as a cornerstone in the realm of reinforcement learning.
We introduce a finite-time, control-theoretic framework for analyzing TD-learning.
arXiv Detail & Related papers (2021-12-29T06:43:29Z) - Online Bootstrap Inference For Policy Evaluation in Reinforcement
Learning [90.59143158534849]
The recent emergence of reinforcement learning has created a demand for robust statistical inference methods.
Existing methods for statistical inference in online learning are restricted to settings involving independently sampled observations.
The online bootstrap is a flexible and efficient approach for statistical inference in linear approximation algorithms, but its efficacy in settings involving Markov noise has yet to be explored.
arXiv Detail & Related papers (2021-08-08T18:26:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.