Discerning Temporal Difference Learning
- URL: http://arxiv.org/abs/2310.08091v2
- Date: Sat, 10 Feb 2024 14:27:29 GMT
- Title: Discerning Temporal Difference Learning
- Authors: Jianfei Ma
- Abstract summary: Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL)
We propose a novel TD algorithm named discerning TD learning (DTD)
- Score: 5.439020425819001
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Temporal difference learning (TD) is a foundational concept in reinforcement
learning (RL), aimed at efficiently assessing a policy's value function.
TD($\lambda$), a potent variant, incorporates a memory trace to distribute the
prediction error into the historical context. However, this approach often
neglects the significance of historical states and the relative importance of
propagating the TD error, influenced by challenges such as visitation imbalance
or outcome noise. To address this, we propose a novel TD algorithm named
discerning TD learning (DTD), which allows flexible emphasis
functions$-$predetermined or adapted during training$-$to allocate efforts
effectively across states. We establish the convergence properties of our
method within a specific class of emphasis functions and showcase its promising
potential for adaptation to deep RL contexts. Empirical results underscore that
employing a judicious emphasis function not only improves value estimation but
also expedites learning across diverse scenarios.
Related papers
- Time-Scale Separation in Q-Learning: Extending TD($\triangle$) for Action-Value Function Decomposition [0.0]
This paper introduces Q($Delta$)-Learning, an extension of TD($Delta$) for the Q-Learning framework.
TD($Delta$) facilitates efficient learning over several time scales by breaking the Q($Delta$)-function into distinct discount factors.
We demonstrate through theoretical analysis and practical evaluations on standard benchmarks like Atari that Q($Delta$)-Learning surpasses conventional Q-Learning and TD learning methods.
arXiv Detail & Related papers (2024-11-21T11:03:07Z) - Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks.
In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge.
We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z) - Prediction and Control in Continual Reinforcement Learning [39.30411018922005]
Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies.
We propose to decompose the value function into two components which update at different timescales.
arXiv Detail & Related papers (2023-12-18T19:23:42Z) - The Statistical Benefits of Quantile Temporal-Difference Learning for
Value Estimation [53.53493178394081]
We analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD)
Even if a practitioner has no interest in the return distribution beyond the mean, QTD may offer performance superior to approaches such as classical TD learning.
arXiv Detail & Related papers (2023-05-28T10:52:46Z) - Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy.
We propose an offline RL method that never needs to evaluate actions outside of the dataset.
This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z) - Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation.
We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z) - Taylor Expansion of Discount Factors [56.46324239692532]
In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective.
In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors.
arXiv Detail & Related papers (2021-06-11T05:02:17Z) - Amortized Variational Deep Q Network [28.12600565839504]
We propose an amortized variational inference framework to approximate the posterior distribution of the action value function in Deep Q Network.
We show that the amortized framework can results in significant less learning parameters than existing state-of-the-art method.
arXiv Detail & Related papers (2020-11-03T13:48:18Z) - Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm.
Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function.
We develop an approach for representation learning in RL that sits in between these two extremes.
This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.