The surprising efficiency of temporal difference learning for rare event prediction
- URL: http://arxiv.org/abs/2405.17638v2
- Date: Sun, 10 Nov 2024 17:57:34 GMT
- Title: The surprising efficiency of temporal difference learning for rare event prediction
- Authors: Xiaoou Cheng, Jonathan Weare,
- Abstract summary: We quantify the efficiency of temporal difference (TD) learning over the direct, or Monte Carlo (MC) estimator for policy evaluation in reinforcement learning.
We show that LSTD can achieve relative accuracy far more efficiently than MC.
Even when both the timescale of the rare event and the relative accuracy of the MC estimator are exponentially large in the number of states, LSTD maintains a fixed level of relative accuracy.
- Score: 0.0
- License:
- Abstract: We quantify the efficiency of temporal difference (TD) learning over the direct, or Monte Carlo (MC), estimator for policy evaluation in reinforcement learning, with an emphasis on estimation of quantities related to rare events. Policy evaluation is complicated in the rare event setting by the long timescale of the event and by the need for \emph{relative accuracy} in estimates of very small values. Specifically, we focus on least-squares TD (LSTD) prediction for finite state Markov chains, and show that LSTD can achieve relative accuracy far more efficiently than MC. We prove a central limit theorem for the LSTD estimator and upper bound the \emph{relative asymptotic variance} by simple quantities characterizing the connectivity of states relative to the transition probabilities between them. Using this bound, we show that, even when both the timescale of the rare event and the relative accuracy of the MC estimator are exponentially large in the number of states, LSTD maintains a fixed level of relative accuracy with a total number of observed transitions of the Markov chain that is only \emph{polynomially} large in the number of states.
Related papers
- Statistical Inference for Temporal Difference Learning with Linear Function Approximation [62.69448336714418]
Temporal Difference (TD) learning, arguably the most widely used for policy evaluation, serves as a natural framework for this purpose.
In this paper, we study the consistency properties of TD learning with Polyak-Ruppert averaging and linear function approximation, and obtain three significant improvements over existing results.
arXiv Detail & Related papers (2024-10-21T15:34:44Z) - Score Matching-based Pseudolikelihood Estimation of Neural Marked
Spatio-Temporal Point Process with Uncertainty Quantification [59.81904428056924]
We introduce SMASH: a Score MAtching estimator for learning markedPs with uncertainty quantification.
Specifically, our framework adopts a normalization-free objective by estimating the pseudolikelihood of markedPs through score-matching.
The superior performance of our proposed framework is demonstrated through extensive experiments in both event prediction and uncertainty quantification.
arXiv Detail & Related papers (2023-10-25T02:37:51Z) - On Double Descent in Reinforcement Learning with LSTD and Random
Features [1.5873758872998507]
Temporal Difference (TD) algorithms are widely used in Deep Reinforcement Learning (RL)
We present a theoretical analysis of the influence of network size and $l$-regularization on performance.
We observe a double descent phenomenon, i.e., a sudden drop in performance around the parameter/state ratio of one.
arXiv Detail & Related papers (2023-10-09T08:33:22Z) - On the Statistical Benefits of Temporal Difference Learning [6.408072565019087]
Given a dataset on actions and resulting long-term rewards, a direct estimation approach fits value functions.
We show that an intuitive inverse trajectory pooling coefficient completely characterizes the percent reduction in mean-squared error of value estimates.
We prove that there can be dramatic improvements in estimates of the difference in value-to-go for two states.
arXiv Detail & Related papers (2023-01-30T21:02:25Z) - An Anomaly Detection Method for Satellites Using Monte Carlo Dropout [7.848121055546167]
We present a tractable approximation for BNN based on the Monte Carlo (MC) dropout method for capturing the uncertainty in the satellite telemetry time series.
Our proposed time series AD approach outperforms the existing methods from both prediction accuracy and AD perspectives.
arXiv Detail & Related papers (2022-11-27T21:12:26Z) - Statistical Efficiency of Score Matching: The View from Isoperimetry [96.65637602827942]
We show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated.
We formalize these results both in the sample regime and in the finite regime.
arXiv Detail & Related papers (2022-10-03T06:09:01Z) - Tight Mutual Information Estimation With Contrastive Fenchel-Legendre
Optimization [69.07420650261649]
We introduce a novel, simple, and powerful contrastive MI estimator named as FLO.
Empirically, our FLO estimator overcomes the limitations of its predecessors and learns more efficiently.
The utility of FLO is verified using an extensive set of benchmarks, which also reveals the trade-offs in practical MI estimation.
arXiv Detail & Related papers (2021-07-02T15:20:41Z) - Amortized Conditional Normalized Maximum Likelihood: Reliable Out of
Distribution Uncertainty Estimation [99.92568326314667]
We propose the amortized conditional normalized maximum likelihood (ACNML) method as a scalable general-purpose approach for uncertainty estimation.
Our algorithm builds on the conditional normalized maximum likelihood (CNML) coding scheme, which has minimax optimal properties according to the minimum description length principle.
We demonstrate that ACNML compares favorably to a number of prior techniques for uncertainty estimation in terms of calibration on out-of-distribution inputs.
arXiv Detail & Related papers (2020-11-05T08:04:34Z) - Infinity Learning: Learning Markov Chains from Aggregate Steady-State
Observations [13.973232545822247]
We consider the task of learning a parametric Continuous Time Markov Chain (CTMC) sequence model without examples of sequences.
We propose $infty$-SGD, a gradient descent method that uses randomly-stopped estimators to avoid infinite sums required by the steady state.
We apply $infty$-SGD to a real-world testbed and synthetic experiments showcasing its accuracy, ability to extrapolate the steady state distribution to unobserved states.
arXiv Detail & Related papers (2020-02-11T03:29:13Z) - Targeted stochastic gradient Markov chain Monte Carlo for hidden Markov models with rare latent states [48.705095800341944]
Markov chain Monte Carlo (MCMC) algorithms for hidden Markov models often rely on the forward-backward sampler.
This makes them computationally slow as the length of the time series increases, motivating the development of sub-sampling-based approaches.
We propose a targeted sub-sampling approach that over-samples observations corresponding to rare latent states when calculating the gradient of parameters associated with them.
arXiv Detail & Related papers (2018-10-31T17:44:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.