Related papers: Discerning Temporal Difference Learning

Discerning Temporal Difference Learning

URL: http://arxiv.org/abs/2310.08091v2
Date: Sat, 10 Feb 2024 14:27:29 GMT
Title: Discerning Temporal Difference Learning
Authors: Jianfei Ma
Abstract summary: Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL) We propose a novel TD algorithm named discerning TD learning (DTD)
Score: 5.439020425819001
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Temporal difference learning (TD) is a foundational concept in reinforcement learning (RL), aimed at efficiently assessing a policy's value function. TD($\lambda$), a potent variant, incorporates a memory trace to distribute the prediction error into the historical context. However, this approach often neglects the significance of historical states and the relative importance of propagating the TD error, influenced by challenges such as visitation imbalance or outcome noise. To address this, we propose a novel TD algorithm named discerning TD learning (DTD), which allows flexible emphasis functions$-$predetermined or adapted during training$-$to allocate efforts effectively across states. We establish the convergence properties of our method within a specific class of emphasis functions and showcase its promising potential for adaptation to deep RL contexts. Empirical results underscore that employing a judicious emphasis function not only improves value estimation but also expedites learning across diverse scenarios.

Related papers

Temporal Difference Flows [82.24174052059352]
Geometric Horizon Models (GHMs) offer a compelling alternative by directly making predictions of future states. Existing methods are negatively affected by bootstrapping predictions at train time and struggle to generate high-quality predictions at long horizons. This paper introduces Temporal Difference Flows (TD-Flow), which leverages the structure of a novel Bellman equation on probability paths alongside flow-matching techniques to learn accurate GHMs at over 5x the horizon length of prior methods.
arXiv Detail & Related papers (2025-03-12T20:30:07Z)
Time-Scale Separation in Q-Learning: Extending TD($\triangle$) for Action-Value Function Decomposition [0.0]
This paper introduces Q($Delta$)-Learning, an extension of TD($Delta$) for the Q-Learning framework. TD($Delta$) facilitates efficient learning over several time scales by breaking the Q($Delta$)-function into distinct discount factors. We demonstrate through theoretical analysis and practical evaluations on standard benchmarks like Atari that Q($Delta$)-Learning surpasses conventional Q-Learning and TD learning methods.
arXiv Detail & Related papers (2024-11-21T11:03:07Z)
Temporal-Difference Variational Continual Learning [89.32940051152782]
A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks. In Continual Learning settings, models often struggle to balance learning new tasks with retaining previous knowledge. We propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations.
arXiv Detail & Related papers (2024-10-10T10:58:41Z)
Efficient Off-Policy Learning for High-Dimensional Action Spaces [22.129001951441015]
Existing off-policy reinforcement learning algorithms often rely on an explicit state-action-value function representation. We present an efficient approach that utilizes only a state-value function as the critic for off-policy deep reinforcement learning.
arXiv Detail & Related papers (2024-03-07T12:45:51Z)
Prediction and Control in Continual Reinforcement Learning [39.30411018922005]
Temporal difference (TD) learning is often used to update the estimate of the value function which is used by RL agents to extract useful policies. We propose to decompose the value function into two components which update at different timescales.
arXiv Detail & Related papers (2023-12-18T19:23:42Z)
The Statistical Benefits of Quantile Temporal-Difference Learning for Value Estimation [53.53493178394081]
We analyse the use of a distributional reinforcement learning algorithm, quantile temporal-difference learning (QTD) Even if a practitioner has no interest in the return distribution beyond the mean, QTD may offer performance superior to approaches such as classical TD learning.
arXiv Detail & Related papers (2023-05-28T10:52:46Z)
Offline Reinforcement Learning with Implicit Q-Learning [85.62618088890787]
Current offline reinforcement learning methods need to query the value of unseen actions during training to improve the policy. We propose an offline RL method that never needs to evaluate actions outside of the dataset. This method enables the learned policy to improve substantially over the best behavior in the data through generalization.
arXiv Detail & Related papers (2021-10-12T17:05:05Z)
Variance-Aware Off-Policy Evaluation with Linear Function Approximation [85.75516599931632]
We study the off-policy evaluation problem in reinforcement learning with linear function approximation. We propose an algorithm, VA-OPE, which uses the estimated variance of the value function to reweight the Bellman residual in Fitted Q-Iteration.
arXiv Detail & Related papers (2021-06-22T17:58:46Z)
Taylor Expansion of Discount Factors [56.46324239692532]
In practical reinforcement learning (RL), the discount factor used for estimating value functions often differs from that used for defining the evaluation objective. In this work, we study the effect that this discrepancy of discount factors has during learning, and discover a family of objectives that interpolate value functions of two distinct discount factors.
arXiv Detail & Related papers (2021-06-11T05:02:17Z)
Amortized Variational Deep Q Network [28.12600565839504]
We propose an amortized variational inference framework to approximate the posterior distribution of the action value function in Deep Q Network. We show that the amortized framework can results in significant less learning parameters than existing state-of-the-art method.
arXiv Detail & Related papers (2020-11-03T13:48:18Z)
Value-driven Hindsight Modelling [68.658900923595]
Value estimation is a critical component of the reinforcement learning (RL) paradigm. Model learning can make use of the rich transition structure present in sequences of observations, but this approach is usually not sensitive to the reward function. We develop an approach for representation learning in RL that sits in between these two extremes. This provides tractable prediction targets that are directly relevant for a task, and can thus accelerate learning the value function.
arXiv Detail & Related papers (2020-02-19T18:10:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.