Statistical Inference for Temporal Difference Learning with Linear Function Approximation
- URL: http://arxiv.org/abs/2410.16106v3
- Date: Wed, 28 May 2025 00:49:57 GMT
- Title: Statistical Inference for Temporal Difference Learning with Linear Function Approximation
- Authors: Weichen Wu, Gen Li, Yuting Wei, Alessandro Rinaldo,
- Abstract summary: We investigate the statistical properties of Temporal Difference learning with Polyak-Ruppert averaging.<n>We make three significant contributions that improve the current state-of-the-art results.
- Score: 62.69448336714418
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: We investigate the statistical properties of Temporal Difference (TD) learning with Polyak-Ruppert averaging, arguably one of the most widely used algorithms in reinforcement learning, for the task of estimating the parameters of the optimal linear approximation to the value function. We make three significant contributions that improve the current state-of-the-art results: (i) we derive sharper high probability convergence guarantee that depend explicitly on the asymptotic variance and hold under weaker conditions than those normally assumed; (ii) we establish refined high-dimensional Berry-Esseen bounds over the class of convex sets, achieving faster rates than those previously established in the literature, and (iii) we propose and analyze a novel, computationally efficient online plug-in estimator of the asymptotic covariance matrix.These results enable the construction of confidence regions and simultaneous confidence intervals for the linear parameters of the value function approximation, with guaranteed finite-sample coverage. We demonstrate the applicability of our theoretical findings through numerical experiments.
Related papers
- Efficient Adaptive Experimentation with Non-Compliance [39.43227019824619]
We study the problem of estimating the average treatment effect (ATE) in adaptive experiments where treatment can only be encouraged--rather than directly assigned--via a binary instrumental variable.<n>We introduce AMRIV, an online policy that adaptively approximates the optimal allocation with (ii) a sequential, influence-function-based estimator that attains the semi-parametric efficiency bound while retaining multiplyrobust consistency.
arXiv Detail & Related papers (2025-05-23T04:49:14Z) - Uncertainty quantification for Markov chains with application to temporal difference learning [63.49764856675643]
We develop novel high-dimensional concentration inequalities and Berry-Esseen bounds for vector- and matrix-valued functions of Markov chains.
We analyze the TD learning algorithm, a widely used method for policy evaluation in reinforcement learning.
arXiv Detail & Related papers (2025-02-19T15:33:55Z) - Asymptotic Time-Uniform Inference for Parameters in Averaged Stochastic Approximation [23.89036529638614]
We study time-uniform statistical inference for parameters in approximation (SA)
We analyze the almost-sure convergence rates of the averaged iterates to a scaled sum of Gaussians in both linear and nonlinear SA problems.
arXiv Detail & Related papers (2024-10-19T10:27:26Z) - Statistical Inference of Optimal Allocations I: Regularities and their Implications [5.911223351920214]
We derive Hadamard differentiability of the value functions through analyzing the properties of the sorting operator.<n>Building on our Hadamard differentiability results, we apply the functional delta method to obtain the properties of the value function process.<n>We present a double / debiased estimator for the value functions.
arXiv Detail & Related papers (2024-03-27T04:39:13Z) - Online Learning Approach for Survival Analysis [1.0499611180329806]
We introduce an online mathematical framework for survival analysis, allowing real time adaptation to dynamic environments and censored data.
This framework enables the estimation of event time distributions through an optimal second order online convex optimization algorithm-Online Newton Step (ONS)
arXiv Detail & Related papers (2024-02-07T08:15:30Z) - High Confidence Level Inference is Almost Free using Parallel Stochastic
Optimization [16.38026811561888]
This paper introduces a novel inference method focused on constructing confidence intervals with efficient computation and fast convergence to the nominal level.
Our method requires minimal additional computation and memory beyond the standard updating of estimates, making the inference process almost cost-free.
arXiv Detail & Related papers (2024-01-17T17:11:45Z) - Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability [17.771354881467435]
We show that a simple algorithm with a universal and instance-independent step size is sufficient to obtain near-optimal variance and bias terms.
Our proof technique is based on refined error bounds for linear approximation together with the novel stability result for the product of random matrices.
arXiv Detail & Related papers (2023-10-22T12:37:25Z) - Optimal Learning via Moderate Deviations Theory [4.6930976245638245]
We develop a systematic construction of highly accurate confidence intervals by using a moderate deviation principle-based approach.
It is shown that the proposed confidence intervals are statistically optimal in the sense that they satisfy criteria regarding exponential accuracy, minimality, consistency, mischaracterization probability, and eventual uniformly most accurate (UMA) property.
arXiv Detail & Related papers (2023-05-23T19:57:57Z) - Online Statistical Inference for Nonlinear Stochastic Approximation with
Markovian Data [22.59079286063505]
We study the statistical inference of nonlinear approximation algorithms utilizing a single trajectory of Markovian data.
Our methodology has practical applications in various scenarios, such as Gradient Descent (SGD) on autoregressive data and asynchronous Q-Learning.
arXiv Detail & Related papers (2023-02-15T14:31:11Z) - Statistical Optimality of Divide and Conquer Kernel-based Functional
Linear Regression [1.7227952883644062]
This paper studies the convergence performance of divide-and-conquer estimators in the scenario that the target function does not reside in the underlying kernel space.
As a decomposition-based scalable approach, the divide-and-conquer estimators of functional linear regression can substantially reduce the algorithmic complexities in time and memory.
arXiv Detail & Related papers (2022-11-20T12:29:06Z) - Data-Driven Influence Functions for Optimization-Based Causal Inference [105.5385525290466]
We study a constructive algorithm that approximates Gateaux derivatives for statistical functionals by finite differencing.
We study the case where probability distributions are not known a priori but need to be estimated from data.
arXiv Detail & Related papers (2022-08-29T16:16:22Z) - A Boosting Approach to Reinforcement Learning [59.46285581748018]
We study efficient algorithms for reinforcement learning in decision processes whose complexity is independent of the number of states.
We give an efficient algorithm that is capable of improving the accuracy of such weak learning methods.
arXiv Detail & Related papers (2021-08-22T16:00:45Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Finite Sample Analysis of Minimax Offline Reinforcement Learning:
Completeness, Fast Rates and First-Order Efficiency [83.02999769628593]
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement learning.
We show that the minimax approach enables us to achieve a fast rate of convergence for weights and quality functions.
We present the first finite-sample result with first-order efficiency in non-tabular environments.
arXiv Detail & Related papers (2021-02-05T03:20:39Z) - CoinDICE: Off-Policy Confidence Interval Estimation [107.86876722777535]
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning.
We show in a variety of benchmarks that the confidence interval estimates are tighter and more accurate than existing methods.
arXiv Detail & Related papers (2020-10-22T12:39:11Z) - Fast Objective & Duality Gap Convergence for Non-Convex Strongly-Concave
Min-Max Problems with PL Condition [52.08417569774822]
This paper focuses on methods for solving smooth non-concave min-max problems, which have received increasing attention due to deep learning (e.g., deep AUC)
arXiv Detail & Related papers (2020-06-12T00:32:21Z) - Distributional Robustness and Regularization in Reinforcement Learning [62.23012916708608]
We introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function.
It suggests using regularization as a practical tool for dealing with $textitexternal uncertainty$ in reinforcement learning.
arXiv Detail & Related papers (2020-03-05T19:56:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.