Uncertainty quantification for Markov chains with application to temporal difference learning
- URL: http://arxiv.org/abs/2502.13822v1
- Date: Wed, 19 Feb 2025 15:33:55 GMT
- Title: Uncertainty quantification for Markov chains with application to temporal difference learning
- Authors: Weichen Wu, Yuting Wei, Alessandro Rinaldo,
- Abstract summary: We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains.
We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
- Score: 63.49764856675643
- License:
- Abstract: Markov chains are fundamental to statistical machine learning, underpinning key methodologies such as Markov Chain Monte Carlo (MCMC) sampling and temporal difference (TD) learning in reinforcement learning (RL). Given their widespread use, it is crucial to establish rigorous probabilistic guarantees on their convergence, uncertainty, and stability. In this work, we develop novel, high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains, addressing key limitations in existing theoretical tools for handling dependent data. We leverage these results to analyze the TD learning algorithm, a widely used method for policy evaluation in RL. Our analysis yields a sharp high-probability consistency guarantee that matches the asymptotic variance up to logarithmic factors. Furthermore, we establish a $O(T^{-\frac{1}{4}}\log T)$ distributional convergence rate for the Gaussian approximation of the TD estimator, measured in convex distance. These findings provide new insights into statistical inference for RL algorithms, bridging the gaps between classical stochastic approximation theory and modern reinforcement learning applications.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
We study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation.
First, we derive a novel high-dimensional probability convergence guarantee that depends explicitly on the variance and holds under weak conditions.
We further establish refined high-dimensional Berry-Esseen bounds over the class of convex sets that guarantee faster rates than those in the literature.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Fast Value Tracking for Deep Reinforcement Learning [7.648784748888187]
Reinforcement learning (RL) tackles sequential decision-making problems by creating agents that interact with their environment.
Existing algorithms often view these problem as static, focusing on point estimates for model parameters to maximize expected rewards.
Our research leverages the Kalman paradigm to introduce a novel quantification and sampling algorithm called Langevinized Kalman TemporalTD.
arXiv Detail & Related papers (2024-03-19T22:18:19Z) - An Analysis of Quantile Temporal-Difference Learning [53.36758478669685]
quantile temporal-difference learning (QTD) has proven to be a key component in several successful large-scale applications of reinforcement learning.
Unlike classical TD learning, QTD updates do not approximate contraction mappings, are highly non-linear, and may have multiple fixed points.
This paper is a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1.
arXiv Detail & Related papers (2023-01-11T13:41:56Z) - Comparison of Markov chains via weak Poincar\'e inequalities with
application to pseudo-marginal MCMC [0.0]
We investigate the use of a certain class of functional inequalities known as weak Poincar'e inequalities to bound convergence of Markov chains to equilibrium.
We show that this enables the derivation of subgeometric convergence bounds for methods such as the Independent Metropolis--Hastings sampler and pseudo-marginal methods for intractable likelihoods.
arXiv Detail & Related papers (2021-12-10T15:36:30Z) - Online Bootstrap Inference For Policy Evaluation in Reinforcement
Learning [90.59143158534849]
The recent emergence of reinforcement learning has created a demand for robust statistical inference methods.
Existing methods for statistical inference in online learning are restricted to settings involving independently sampled observations.
The online bootstrap is a flexible and efficient approach for statistical inference in linear approximation algorithms, but its efficacy in settings involving Markov noise has yet to be explored.
arXiv Detail & Related papers (2021-08-08T18:26:35Z) - Three rates of convergence or separation via U-statistics in a dependent
framework [5.929956715430167]
We put this theoretical breakthrough into action by pushing further the current state of knowledge in three different active fields of research.
First, we establish a new exponential inequality for the estimation of spectra of trace class integral operators with MCMC methods.
In addition, we investigate generalization performance of online algorithms working with pairwise loss functions and Markov chain samples.
arXiv Detail & Related papers (2021-06-24T07:10:36Z) - A Distributional Analysis of Sampling-Based Reinforcement Learning
Algorithms [67.67377846416106]
We present a distributional approach to theoretical analyses of reinforcement learning algorithms for constant step-sizes.
We show that value-based methods such as TD($lambda$) and $Q$-Learning have update rules which are contractive in the space of distributions of functions.
arXiv Detail & Related papers (2020-03-27T05:13:29Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.