Related papers: Loss Dynamics of Temporal Difference Reinforcement Learning

Loss Dynamics of Temporal Difference Reinforcement Learning

URL: http://arxiv.org/abs/2307.04841v2
Date: Tue, 7 Nov 2023 12:15:15 GMT
Title: Loss Dynamics of Temporal Difference Reinforcement Learning
Authors: Blake Bordelon, Paul Masset, Henry Kuo, Cengiz Pehlevan
Abstract summary: We study the case learning curves for temporal difference learning of a value function with linear function approximators. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function.
Score: 36.772501199987076
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Reinforcement learning has been successful across several applications in which agents have to learn to act in environments with sparse feedback. However, despite this empirical success there is still a lack of theoretical understanding of how the parameters of reinforcement learning models and the features used to represent states interact to control the dynamics of learning. In this work, we use concepts from statistical physics, to study the typical case learning curves for temporal difference learning of a value function with linear function approximators. Our theory is derived under a Gaussian equivalence hypothesis where averages over the random trajectories are replaced with temporally correlated Gaussian feature averages and we validate our assumptions on small scale Markov Decision Processes. We find that the stochastic semi-gradient noise due to subsampling the space of possible episodes leads to significant plateaus in the value error, unlike in traditional gradient descent dynamics. We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function. We then analyze how strategies like learning rate annealing and reward shaping can favorably alter learning dynamics and plateaus. To conclude, our work introduces new tools to open a new direction towards developing a theory of learning dynamics in reinforcement learning.

Related papers

The Importance of Being Lazy: Scaling Limits of Continual Learning [60.97756735877614]
We show that increasing model width is only beneficial when it reduces the amount of feature learning, yielding more laziness.<n>We study the intricate relationship between feature learning, task non-stationarity, and forgetting, finding that high feature learning is only beneficial with highly similar tasks.
arXiv Detail & Related papers (2025-06-20T10:12:38Z)
Dynamics of Supervised and Reinforcement Learning in the Non-Linear Perceptron [3.069335774032178]
We use a dataset-process approach to derive flow equations describing learning. We characterize the effects of the learning rule (supervised or reinforcement learning, SL/RL) and input-data distribution on the perceptron's learning curve. This approach points a way toward analyzing learning dynamics for more-complex circuit architectures.
arXiv Detail & Related papers (2024-09-05T17:58:28Z)
Normalization and effective learning rates in reinforcement learning [52.59508428613934]
Normalization layers have recently experienced a renaissance in the deep reinforcement learning and continual learning literature. We show that normalization brings with it a subtle but important side effect: an equivalence between growth in the norm of the network parameters and decay in the effective learning rate. We propose to make the learning rate schedule explicit with a simple re- parameterization which we call Normalize-and-Project.
arXiv Detail & Related papers (2024-07-01T20:58:01Z)
On the Dynamics Under the Unhinged Loss and Beyond [104.49565602940699]
We introduce the unhinged loss, a concise loss function, that offers more mathematical opportunities to analyze closed-form dynamics. The unhinged loss allows for considering more practical techniques, such as time-vary learning rates and feature normalization.
arXiv Detail & Related papers (2023-12-13T02:11:07Z)
Interpretable Imitation Learning with Dynamic Causal Relations [65.18456572421702]
We propose to expose captured knowledge in the form of a directed acyclic causal graph. We also design this causal discovery process to be state-dependent, enabling it to model the dynamics in latent causal graphs. The proposed framework is composed of three parts: a dynamic causal discovery module, a causality encoding module, and a prediction module, and is trained in an end-to-end manner.
arXiv Detail & Related papers (2023-09-30T20:59:42Z)
Bayesian Learning for Dynamic Inference [2.2843885788439793]
In some sequential estimation problems, the future values of the quantity to be estimated depend on the estimate of its current value. We formulate the Bayesian learning problem for dynamic inference, where the unknown quantity-generation model is assumed to be randomly drawn. We derive the optimal Bayesian learning rules, both offline and online, to minimize the inference loss.
arXiv Detail & Related papers (2022-12-30T19:16:23Z)
Deep Active Learning by Leveraging Training Dynamics [57.95155565319465]
We propose a theory-driven deep active learning method (dynamicAL) which selects samples to maximize training dynamics. We show that dynamicAL not only outperforms other baselines consistently but also scales well on large deep learning models.
arXiv Detail & Related papers (2021-10-16T16:51:05Z)
Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task. This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z)
Supervised Learning in the Presence of Concept Drift: A modelling framework [5.22609266390809]
We present a modelling framework for the investigation of supervised learning in non-stationary environments. We model two example types of learning systems: prototype-based Learning Vector Quantization (LVQ) for classification and shallow, layered neural networks for regression tasks.
arXiv Detail & Related papers (2020-05-21T09:13:58Z)
The large learning rate phase of deep learning: the catapult mechanism [50.23041928811575]
We present a class of neural networks with solvable training dynamics. We find good agreement between our model's predictions and training dynamics in realistic deep learning settings. We believe our results shed light on characteristics of models trained at different learning rates.
arXiv Detail & Related papers (2020-03-04T17:52:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.