How to train RNNs on chaotic data?
- URL: http://arxiv.org/abs/2110.07238v1
- Date: Thu, 14 Oct 2021 09:07:42 GMT
- Title: How to train RNNs on chaotic data?
- Authors: Zahra Monfared, Jonas M. Mikhaeil and Daniel Durstewitz
- Abstract summary: Recurrent neural networks (RNNs) are wide-spread machine learning tools for modeling sequential and time series data.
They are notoriously hard to train because their loss gradients backpropagated in time tend to saturate or diverge during training.
Here we offer a comprehensive theoretical treatment of this problem by relating the loss gradients during RNN training to the Lyapunov spectrum of RNN-generated orbits.
- Score: 7.276372008305615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recurrent neural networks (RNNs) are wide-spread machine learning tools for
modeling sequential and time series data. They are notoriously hard to train
because their loss gradients backpropagated in time tend to saturate or diverge
during training. This is known as the exploding and vanishing gradient problem.
Previous solutions to this issue either built on rather complicated,
purpose-engineered architectures with gated memory buffers, or - more recently
- imposed constraints that ensure convergence to a fixed point or restrict (the
eigenspectrum of) the recurrence matrix. Such constraints, however, convey
severe limitations on the expressivity of the RNN. Essential intrinsic dynamics
such as multistability or chaos are disabled. This is inherently at disaccord
with the chaotic nature of many, if not most, time series encountered in nature
and society. Here we offer a comprehensive theoretical treatment of this
problem by relating the loss gradients during RNN training to the Lyapunov
spectrum of RNN-generated orbits. We mathematically prove that RNNs producing
stable equilibrium or cyclic behavior have bounded gradients, whereas the
gradients of RNNs with chaotic dynamics always diverge. Based on these analyses
and insights, we offer an effective yet simple training technique for chaotic
data and guidance on how to choose relevant hyperparameters according to the
Lyapunov spectrum.
Related papers
- Learning Discretized Neural Networks under Ricci Flow [51.36292559262042]
We study Discretized Neural Networks (DNNs) composed of low-precision weights and activations.
DNNs suffer from either infinite or zero gradients due to the non-differentiable discrete function during training.
arXiv Detail & Related papers (2023-02-07T10:51:53Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - On Scrambling Phenomena for Randomly Initialized Recurrent Networks [16.36123688742091]
Recurrent Neural Networks (RNNs) frequently exhibit complicated dynamics.
Recent works have shed light on such phenomena analyzing when exploding or vanishing gradients may occur.
We prove a qualitatively stronger phenomenon about RNNs than what exploding gradients seem to suggest.
arXiv Detail & Related papers (2022-10-11T07:28:28Z) - Training High-Performance Low-Latency Spiking Neural Networks by
Differentiation on Spike Representation [70.75043144299168]
Spiking Neural Network (SNN) is a promising energy-efficient AI model when implemented on neuromorphic hardware.
It is a challenge to efficiently train SNNs due to their non-differentiability.
We propose the Differentiation on Spike Representation (DSR) method, which could achieve high performance.
arXiv Detail & Related papers (2022-05-01T12:44:49Z) - Reverse engineering recurrent neural networks with Jacobian switching
linear dynamical systems [24.0378100479104]
Recurrent neural networks (RNNs) are powerful models for processing time-series data.
The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has significant challenges.
We present a new model that overcomes these limitations by co-training an RNN with a novel switching linear dynamical system (SLDS) formulation.
arXiv Detail & Related papers (2021-11-01T20:49:30Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - UnICORNN: A recurrent model for learning very long time dependencies [0.0]
We propose a novel RNN architecture based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations.
The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem.
arXiv Detail & Related papers (2021-03-09T15:19:59Z) - Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z) - Implicit Bias of Linear RNNs [27.41989861342218]
Linear recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory.
This paper provides a rigorous explanation of this property in the special case of linear RNNs.
Using recently-developed kernel regime analysis, our main result shows that linear RNNs are functionally equivalent to a certain weighted 1D-convolutional network.
arXiv Detail & Related papers (2021-01-19T19:39:28Z) - When and why PINNs fail to train: A neural tangent kernel perspective [2.1485350418225244]
We derive the Neural Tangent Kernel (NTK) of PINNs and prove that, under appropriate conditions, it converges to a deterministic kernel that stays constant during training in the infinite-width limit.
We find a remarkable discrepancy in the convergence rate of the different loss components contributing to the total training error.
We propose a novel gradient descent algorithm that utilizes the eigenvalues of the NTK to adaptively calibrate the convergence rate of the total training error.
arXiv Detail & Related papers (2020-07-28T23:44:56Z) - Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models.
Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems.
These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.