The Statistical Benefits of Quantile Temporal-Difference Learning for
Value Estimation
- URL: http://arxiv.org/abs/2305.18388v1
- Date: Sun, 28 May 2023 10:52:46 GMT
- Title: The Statistical Benefits of Quantile Temporal-Difference Learning for
Value Estimation
- Authors: Mark Rowland, Yunhao Tang, Clare Lyle, R\'emi Munos, Marc G.
Bellemare, Will Dabney
- Abstract summary: We analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD)
Even if a practitioner has no interest in the return distribution beyond the mean, QTD may offer performance superior to approaches such as classical TD learning.
- Score: 53.53493178394081
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study the problem of temporal-difference-based policy evaluation in
reinforcement learning. In particular, we analyse the use of a distributional
reinforcement learning algorithm, quantile temporal-difference learning (QTD),
for this task. We reach the surprising conclusion that even if a practitioner
has no interest in the return distribution beyond the mean, QTD (which learns
predictions about the full distribution of returns) may offer performance
superior to approaches such as classical TD learning, which predict only the
mean return, even in the tabular setting.
Related papers
- Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Discerning Temporal Difference Learning [5.439020425819001]
Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL)
We propose a novel TD algorithm named discerning TD learning (DTD)
arXiv Detail & Related papers (2023-10-12T07:38:10Z) - Uncertainty Estimation by Fisher Information-based Evidential Deep
Learning [61.94125052118442]
Uncertainty estimation is a key factor that makes deep learning reliable in practical applications.
We propose a novel method, Fisher Information-based Evidential Deep Learning ($mathcalI$-EDL)
In particular, we introduce Fisher Information Matrix (FIM) to measure the informativeness of evidence carried by each sample, according to which we can dynamically reweight the objective loss terms to make the network more focused on the representation learning of uncertain classes.
arXiv Detail & Related papers (2023-03-03T16:12:59Z) - Finite-Time Analysis of Temporal Difference Learning: Discrete-Time
Linear System Perspective [3.5823366350053325]
TD-learning is a fundamental algorithm in the field of reinforcement learning (RL)
Recent research has uncovered guarantees concerning its statistical efficiency by developing finite-time error bounds.
arXiv Detail & Related papers (2022-04-22T03:21:30Z) - Learning Pessimism for Robust and Efficient Off-Policy Reinforcement
Learning [0.0]
Off-policy deep reinforcement learning algorithms compensate for overestimation bias during temporal-difference learning.
In this work, we propose a novel learnable penalty to enact such pessimism.
We also propose to learn the penalty alongside the critic with dual TD-learning, a strategy to estimate and minimize the bias magnitude in the target returns.
arXiv Detail & Related papers (2021-10-07T12:13:19Z) - Pre-emptive learning-to-defer for sequential medical decision-making
under uncertainty [35.077494648756876]
We propose SLTD (Sequential Learning-to-Defer') as a framework for learning-to-defer pre-emptively to an expert in sequential decision-making settings.
SLTD measures the likelihood of improving value of deferring now versus later based on the underlying uncertainty in dynamics.
arXiv Detail & Related papers (2021-09-13T20:43:10Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Quantifying Uncertainty in Deep Spatiotemporal Forecasting [67.77102283276409]
We describe two types of forecasting problems: regular grid-based and graph-based.
We analyze UQ methods from both the Bayesian and the frequentist point view, casting in a unified framework via statistical decision theory.
Through extensive experiments on real-world road network traffic, epidemics, and air quality forecasting tasks, we reveal the statistical computational trade-offs for different UQ methods.
arXiv Detail & Related papers (2021-05-25T14:35:46Z) - Bootstrapping Statistical Inference for Off-Policy Evaluation [43.79456564713911]
We study the use of bootstrapping in off-policy evaluation (OPE)
We propose a bootstrapping FQE method for inferring the distribution of the policy evaluation error and show that this method is efficient and consistent for off-policy statistical inference.
We evaluate the bootrapping method in classical RL environments for confidence interval estimation, estimating the variance of off-policy evaluator, and estimating the correlation between multiple off-policy evaluators.
arXiv Detail & Related papers (2021-02-06T16:45:33Z) - Cross Learning in Deep Q-Networks [82.20059754270302]
We propose a novel cross Q-learning algorithm, aim at alleviating the well-known overestimation problem in value-based reinforcement learning methods.
Our algorithm builds on double Q-learning, by maintaining a set of parallel models and estimate the Q-value based on a randomly selected network.
arXiv Detail & Related papers (2020-09-29T04:58:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.