Bifurcations and loss jumps in RNN training
- URL: http://arxiv.org/abs/2310.17561v1
- Date: Thu, 26 Oct 2023 16:49:44 GMT
- Title: Bifurcations and loss jumps in RNN training
- Authors: Lukas Eisenmann, Zahra Monfared, Niclas Alexander G\"oring, Daniel
Durstewitz
- Abstract summary: We introduce a novel algorithm for detecting all fixed points and k-cycles in ReLU-based RNNs and their existence and stability regions.
Our algorithm provides exact results and returns fixed points and cycles up to high orders with surprisingly good scaling behavior.
- Score: 7.937801286897863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural networks (RNNs) are popular machine learning tools for
modeling and forecasting sequential data and for inferring dynamical systems
(DS) from observed time series. Concepts from DS theory (DST) have variously
been used to further our understanding of both, how trained RNNs solve complex
tasks, and the training process itself. Bifurcations are particularly important
phenomena in DS, including RNNs, that refer to topological (qualitative)
changes in a system's dynamical behavior as one or more of its parameters are
varied. Knowing the bifurcation structure of an RNN will thus allow to deduce
many of its computational and dynamical properties, like its sensitivity to
parameter variations or its behavior during training. In particular,
bifurcations may account for sudden loss jumps observed in RNN training that
could severely impede the training process. Here we first mathematically prove
for a particular class of ReLU-based RNNs that certain bifurcations are indeed
associated with loss gradients tending toward infinity or zero. We then
introduce a novel heuristic algorithm for detecting all fixed points and
k-cycles in ReLU-based RNNs and their existence and stability regions, hence
bifurcation manifolds in parameter space. In contrast to previous numerical
algorithms for finding fixed points and common continuation methods, our
algorithm provides exact results and returns fixed points and cycles up to high
orders with surprisingly good scaling behavior. We exemplify the algorithm on
the analysis of the training process of RNNs, and find that the recently
introduced technique of generalized teacher forcing completely avoids certain
types of bifurcations in training. Thus, besides facilitating the DST analysis
of trained RNNs, our algorithm provides a powerful instrument for analyzing the
training process itself.
Related papers
- Implicit Stochastic Gradient Descent for Training Physics-informed
Neural Networks [51.92362217307946]
Physics-informed neural networks (PINNs) have effectively been demonstrated in solving forward and inverse differential equation problems.
PINNs are trapped in training failures when the target functions to be approximated exhibit high-frequency or multi-scale features.
In this paper, we propose to employ implicit gradient descent (ISGD) method to train PINNs for improving the stability of training process.
arXiv Detail & Related papers (2023-03-03T08:17:47Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Lyapunov-Guided Representation of Recurrent Neural Network Performance [9.449520199858952]
Recurrent Neural Networks (RNN) are ubiquitous computing systems for sequences and time series data.
We propose to treat RNN as dynamical systems and to correlate hyperparameters with accuracy through Lyapunov spectral analysis.
Our studies of various RNN architectures show that AeLLE successfully correlates RNN Lyapunov spectrum with accuracy.
arXiv Detail & Related papers (2022-04-11T05:38:38Z) - Reverse engineering recurrent neural networks with Jacobian switching
linear dynamical systems [24.0378100479104]
Recurrent neural networks (RNNs) are powerful models for processing time-series data.
The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has significant challenges.
We present a new model that overcomes these limitations by co-training an RNN with a novel switching linear dynamical system (SLDS) formulation.
arXiv Detail & Related papers (2021-11-01T20:49:30Z) - What training reveals about neural network complexity [80.87515604428346]
This work explores the hypothesis that the complexity of the function a deep neural network (NN) is learning can be deduced by how fast its weights change during training.
Our results support the hypothesis that good training behavior can be a useful bias towards good generalization.
arXiv Detail & Related papers (2021-06-08T08:58:00Z) - Skip-Connected Self-Recurrent Spiking Neural Networks with Joint
Intrinsic Parameter and Synaptic Weight Training [14.992756670960008]
We propose a new type of RSNN called Skip-Connected Self-Recurrent SNNs (ScSr-SNNs)
ScSr-SNNs can boost performance by up to 2.55% compared with other types of RSNNs trained by state-of-the-art BP methods.
arXiv Detail & Related papers (2020-10-23T22:27:13Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z) - Fast Learning of Graph Neural Networks with Guaranteed Generalizability:
One-hidden-layer Case [93.37576644429578]
Graph neural networks (GNNs) have made great progress recently on learning from graph-structured data in practice.
We provide a theoretically-grounded generalizability analysis of GNNs with one hidden layer for both regression and binary classification problems.
arXiv Detail & Related papers (2020-06-25T00:45:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.