On the Statistical Benefits of Temporal Difference Learning
- URL: http://arxiv.org/abs/2301.13289v3
- Date: Wed, 14 Feb 2024 17:06:49 GMT
- Title: On the Statistical Benefits of Temporal Difference Learning
- Authors: David Cheikhi and Daniel Russo
- Abstract summary: Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions.
We show that an intuitive inverse trajectory pooling coefficient completely characterizes the percent reduction in mean-squared error of value estimates.
We prove that there can be dramatic improvements in estimates of the difference in value-to-go for two states.
- Score: 6.408072565019087
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Given a dataset on actions and resulting long-term rewards, a direct
estimation approach fits value functions that minimize prediction error on the
training data. Temporal difference learning (TD) methods instead fit value
functions by minimizing the degree of temporal inconsistency between estimates
made at successive time-steps. Focusing on finite state Markov chains, we
provide a crisp asymptotic theory of the statistical advantages of this
approach. First, we show that an intuitive inverse trajectory pooling
coefficient completely characterizes the percent reduction in mean-squared
error of value estimates. Depending on problem structure, the reduction could
be enormous or nonexistent. Next, we prove that there can be dramatic
improvements in estimates of the difference in value-to-go for two states: TD's
errors are bounded in terms of a novel measure - the problem's trajectory
crossing time - which can be much smaller than the problem's time horizon.
Related papers
- Gradient Descent Efficiency Index [0.0]
This study introduces a new efficiency metric, Ek, designed to quantify the effectiveness of each iteration.
The proposed metric accounts for both the relative change in error and the stability of the loss function across iterations.
Ek has the potential to guide more informed decisions in the selection and tuning of optimization algorithms in machine learning applications.
arXiv Detail & Related papers (2024-10-25T10:22:22Z) - The surprising efficiency of temporal difference learning for rare event prediction [0.0]
We quantify the efficiency of temporal difference (TD) learning over the direct, or Monte Carlo (MC) estimator for policy evaluation in reinforcement learning.
We show that LSTD can achieve relative accuracy far more efficiently than MC.
Even when both the timescale of the rare event and the relative accuracy of the MC estimator are exponentially large in the number of states, LSTD maintains a fixed level of relative accuracy.
arXiv Detail & Related papers (2024-05-27T20:18:20Z) - Loss Shaping Constraints for Long-Term Time Series Forecasting [79.3533114027664]
We present a Constrained Learning approach for long-term time series forecasting that respects a user-defined upper bound on the loss at each time-step.
We propose a practical Primal-Dual algorithm to tackle it, and aims to demonstrate that it exhibits competitive average performance in time series benchmarks, while shaping the errors across the predicted window.
arXiv Detail & Related papers (2024-02-14T18:20:44Z) - Semiparametric Efficient Inference in Adaptive Experiments [29.43493007296859]
We consider the problem of efficient inference of the Average Treatment Effect in a sequential experiment where the policy governing the assignment of subjects to treatment or control can change over time.
We first provide a central limit theorem for the Adaptive Augmented Inverse-Probability Weighted estimator, which is semi efficient, under weaker assumptions than those previously made in the literature.
We then consider sequential inference setting, deriving both propensity and nonasymptotic confidence sequences that are considerably tighter than previous methods.
arXiv Detail & Related papers (2023-11-30T06:25:06Z) - Better Batch for Deep Probabilistic Time Series Forecasting [15.31488551912888]
We propose an innovative training method that incorporates error autocorrelation to enhance probabilistic forecasting accuracy.
Our method constructs a mini-batch as a collection of $D$ consecutive time series segments for model training.
It explicitly learns a time-varying covariance matrix over each mini-batch, encoding error correlation among adjacent time steps.
arXiv Detail & Related papers (2023-05-26T15:36:59Z) - Uncertainty estimation of pedestrian future trajectory using Bayesian
approximation [137.00426219455116]
Under dynamic traffic scenarios, planning based on deterministic predictions is not trustworthy.
The authors propose to quantify uncertainty during forecasting using approximation which deterministic approaches fail to capture.
The effect of dropout weights and long-term prediction on future state uncertainty has been studied.
arXiv Detail & Related papers (2022-05-04T04:23:38Z) - Taming the Long Tail of Deep Probabilistic Forecasting [16.136753801449263]
We identify a long tail behavior in the performance of state-of-the-art deep learning methods on probabilistic forecasting.
We present two moment-based tailedness measurement concepts to improve performance on the difficult tail examples.
We demonstrate the performance of our approach on several real-world datasets including time series andtemporal trajectories.
arXiv Detail & Related papers (2022-02-27T18:23:41Z) - Imputation-Free Learning from Incomplete Observations [73.15386629370111]
We introduce the importance of guided gradient descent (IGSGD) method to train inference from inputs containing missing values without imputation.
We employ reinforcement learning (RL) to adjust the gradients used to train the models via back-propagation.
Our imputation-free predictions outperform the traditional two-step imputation-based predictions using state-of-the-art imputation methods.
arXiv Detail & Related papers (2021-07-05T12:44:39Z) - SLOE: A Faster Method for Statistical Inference in High-Dimensional
Logistic Regression [68.66245730450915]
We develop an improved method for debiasing predictions and estimating frequentist uncertainty for practical datasets.
Our main contribution is SLOE, an estimator of the signal strength with convergence guarantees that reduces the computation time of estimation and inference by orders of magnitude.
arXiv Detail & Related papers (2021-03-23T17:48:56Z) - A Framework for Sample Efficient Interval Estimation with Control
Variates [94.32811054797148]
We consider the problem of estimating confidence intervals for the mean of a random variable.
Under certain conditions, we show improved efficiency compared to existing estimation algorithms.
arXiv Detail & Related papers (2020-06-18T05:42:30Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.