Loss Dynamics of Temporal Difference Reinforcement Learning
- URL: http://arxiv.org/abs/2307.04841v2
- Date: Tue, 7 Nov 2023 12:15:15 GMT
- Title: Loss Dynamics of Temporal Difference Reinforcement Learning
- Authors: Blake Bordelon, Paul Masset, Henry Kuo, Cengiz Pehlevan
- Abstract summary: We study the case learning curves for temporal difference learning of a value function with linear function approximators.
We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function.
- Score: 36.772501199987076
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Reinforcement learning has been successful across several applications in
which agents have to learn to act in environments with sparse feedback.
However, despite this empirical success there is still a lack of theoretical
understanding of how the parameters of reinforcement learning models and the
features used to represent states interact to control the dynamics of learning.
In this work, we use concepts from statistical physics, to study the typical
case learning curves for temporal difference learning of a value function with
linear function approximators. Our theory is derived under a Gaussian
equivalence hypothesis where averages over the random trajectories are replaced
with temporally correlated Gaussian feature averages and we validate our
assumptions on small scale Markov Decision Processes. We find that the
stochastic semi-gradient noise due to subsampling the space of possible
episodes leads to significant plateaus in the value error, unlike in
traditional gradient descent dynamics. We study how learning dynamics and
plateaus depend on feature structure, learning rate, discount factor, and
reward function. We then analyze how strategies like learning rate annealing
and reward shaping can favorably alter learning dynamics and plateaus. To
conclude, our work introduces new tools to open a new direction towards
developing a theory of learning dynamics in reinforcement learning.
Related papers
- Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron [3.069335774032178]
We use a dataset-process approach to derive flow equations describing learning.
We characterize the effects of the learning rule (supervised or reinforcement learning, SL/RL) and input-data distribution on the perceptron's learning curve.
This approach points a way toward analyzing learning dynamics for more-complex circuit architectures.
arXiv Detail & Related papers (2024-09-05T17:58:28Z) - Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature.
We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate.
We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z) - On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics.
The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z) - Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph.
We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs.
The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z) - Bayesian Learning for Dynamic Inference [2.2843885788439793]
In some sequential estimation problems, the future values of the quantity to be estimated depend on the estimate of its current value.
We formulate the Bayesian learning problem for dynamic inference, where the unknown quantity-generation model is assumed to be randomly drawn.
We derive the optimal Bayesian learning rules, both offline and online, to minimize the inference loss.
arXiv Detail & Related papers (2022-12-30T19:16:23Z) - Deep Active Learning by Leveraging Training Dynamics [57.95155565319465]
We propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics.
We show that dynamicAL not only outperforms other baselines consistently but also scales well on large deep learning models.
arXiv Detail & Related papers (2021-10-16T16:51:05Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Supervised Learning in the Presence of Concept Drift: A modelling
framework [5.22609266390809]
We present a modelling framework for the investigation of supervised learning in non-stationary environments.
We model two example types of learning systems: prototype-based Learning Vector Quantization (LVQ) for classification and shallow, layered neural networks for regression tasks.
arXiv Detail & Related papers (2020-05-21T09:13:58Z) - The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics.
We find good agreement between our model's predictions and training dynamics in realistic deep learning settings.
We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.