Investigating the Edge of Stability Phenomenon in Reinforcement Learning
- URL: http://arxiv.org/abs/2307.04210v1
- Date: Sun, 9 Jul 2023 15:46:27 GMT
- Title: Investigating the Edge of Stability Phenomenon in Reinforcement Learning
- Authors: Rares Iordan, Marc Peter Deisenroth, Mihaela Rosca
- Abstract summary: We explore the edge of stability phenomenon in reinforcement learning (RL)
Despite significant differences to supervised learning, the edge of stability phenomenon can be present in off-policy deep RL.
Our results suggest that, while neural network structure can lead to optimisation dynamics that transfer between problem domains, certain aspects of deep RL optimisation can differentiate it from domains such as supervised learning.
- Score: 20.631461205889487
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent progress has been made in understanding optimisation dynamics in
neural networks trained with full-batch gradient descent with momentum with the
uncovering of the edge of stability phenomenon in supervised learning. The edge
of stability phenomenon occurs as the leading eigenvalue of the Hessian reaches
the divergence threshold of the underlying optimisation algorithm for a
quadratic loss, after which it starts oscillating around the threshold, and the
loss starts to exhibit local instability but decreases over long time frames.
In this work, we explore the edge of stability phenomenon in reinforcement
learning (RL), specifically off-policy Q-learning algorithms across a variety
of data regimes, from offline to online RL. Our experiments reveal that,
despite significant differences to supervised learning, such as
non-stationarity of the data distribution and the use of bootstrapping, the
edge of stability phenomenon can be present in off-policy deep RL. Unlike
supervised learning, however, we observe strong differences depending on the
underlying loss, with DQN -- using a Huber loss -- showing a strong edge of
stability effect that we do not observe with C51 -- using a cross entropy loss.
Our results suggest that, while neural network structure can lead to
optimisation dynamics that transfer between problem domains, certain aspects of
deep RL optimisation can differentiate it from domains such as supervised
learning.
Related papers
- Exploring the Stability Gap in Continual Learning: The Role of the Classification Head [0.6749750044497732]
The stability gap is a phenomenon where models initially lose performance on previously learned tasks before partially recovering during training.
We introduce the nearest-mean classifier (NMC) as a tool to attribute the influence of the backbone and the classification head on the stability gap.
Our experiments demonstrate that NMC not only improves final performance, but also significantly enhances training stability across various continual learning benchmarks.
arXiv Detail & Related papers (2024-11-06T15:45:01Z) - Super Level Sets and Exponential Decay: A Synergistic Approach to Stable Neural Network Training [0.0]
We develop a dynamic learning rate algorithm that integrates exponential decay and advanced anti-overfitting strategies.
We prove that the superlevel sets of the loss function, as influenced by our adaptive learning rate, are always connected.
arXiv Detail & Related papers (2024-09-25T09:27:17Z) - Universal Sharpness Dynamics in Neural Network Training: Fixed Point
Analysis, Edge of Stability, and Route to Chaos [6.579523168465526]
In descent dynamics of neural networks, the top eigenvalue of the Hessian of the loss (sharpness) displays a variety of robust phenomena throughout training.
We demonstrate that a simple $2$-layer linear network (UV model) trained on a single training example exhibits all of the essential sharpness phenomenology observed in real-world scenarios.
arXiv Detail & Related papers (2023-11-03T17:59:40Z) - Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - On a continuous time model of gradient descent dynamics and instability
in deep learning [12.20253214080485]
We propose the principal flow (PF) as a continuous time flow that approximates gradient descent dynamics.
The PF sheds light on the recently observed edge of stability phenomena in deep learning.
Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.
arXiv Detail & Related papers (2023-02-03T19:03:10Z) - Stability and Generalization Analysis of Gradient Methods for Shallow
Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability.
We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z) - A PDE-based Explanation of Extreme Numerical Sensitivities and Edge of Stability in Training Neural Networks [12.355137704908042]
We show restrained numerical instabilities in current training practices of deep networks with gradient descent (SGD)
We do this by presenting a theoretical framework using numerical analysis of partial differential equations (PDE), and analyzing the gradient descent PDE of convolutional neural networks (CNNs)
We show this is a consequence of the non-linear PDE associated with the descent of the CNN, whose local linearization changes when over-driving the step size of the discretization resulting in a stabilizing effect.
arXiv Detail & Related papers (2022-06-04T14:54:05Z) - Gradient Descent on Neural Networks Typically Occurs at the Edge of
Stability [94.4070247697549]
Full-batch gradient descent on neural network training objectives operates in a regime we call the Edge of Stability.
In this regime, the maximum eigenvalue of the training loss Hessian hovers just above the numerical value $2 / text(step size)$, and the training loss behaves non-monotonically over short timescales, yet consistently decreases over long timescales.
arXiv Detail & Related papers (2021-02-26T22:08:19Z) - Gradient Starvation: A Learning Proclivity in Neural Networks [97.02382916372594]
Gradient Starvation arises when cross-entropy loss is minimized by capturing only a subset of features relevant for the task.
This work provides a theoretical explanation for the emergence of such feature imbalance in neural networks.
arXiv Detail & Related papers (2020-11-18T18:52:08Z) - Robust Inverse Reinforcement Learning under Transition Dynamics Mismatch [60.23815709215807]
We study the inverse reinforcement learning (IRL) problem under a transition dynamics mismatch between the expert and the learner.
We propose a robust MCE IRL algorithm, which is a principled approach to help with this mismatch.
arXiv Detail & Related papers (2020-07-02T14:57:13Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.