On Scrambling Phenomena for Randomly Initialized Recurrent Networks
- URL: http://arxiv.org/abs/2210.05212v1
- Date: Tue, 11 Oct 2022 07:28:28 GMT
- Title: On Scrambling Phenomena for Randomly Initialized Recurrent Networks
- Authors: Vaggos Chatziafratis, Ioannis Panageas, Clayton Sanford, Stelios
Andrew Stavroulakis
- Abstract summary: Recurrent Neural Networks (RNNs) frequently exhibit complicated dynamics.
Recent works have shed light on such phenomena analyzing when exploding or vanishing gradients may occur.
We prove a qualitatively stronger phenomenon about RNNs than what exploding gradients seem to suggest.
- Score: 16.36123688742091
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent Neural Networks (RNNs) frequently exhibit complicated dynamics, and
their sensitivity to the initialization process often renders them notoriously
hard to train. Recent works have shed light on such phenomena analyzing when
exploding or vanishing gradients may occur, either of which is detrimental for
training dynamics. In this paper, we point to a formal connection between RNNs
and chaotic dynamical systems and prove a qualitatively stronger phenomenon
about RNNs than what exploding gradients seem to suggest. Our main result
proves that under standard initialization (e.g., He, Xavier etc.), RNNs will
exhibit \textit{Li-Yorke chaos} with \textit{constant} probability
\textit{independent} of the network's width. This explains the experimentally
observed phenomenon of \textit{scrambling}, under which trajectories of nearby
points may appear to be arbitrarily close during some timesteps, yet will be
far away in future timesteps. In stark contrast to their feedforward
counterparts, we show that chaotic behavior in RNNs is preserved under small
perturbations and that their expressive power remains exponential in the number
of feedback iterations. Our technical arguments rely on viewing RNNs as random
walks under non-linear activations, and studying the existence of certain types
of higher-order fixed points called \textit{periodic points} that lead to phase
transitions from order to chaos.
Related papers
- How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Dynamic Causal Explanation Based Diffusion-Variational Graph Neural
Network for Spatio-temporal Forecasting [60.03169701753824]
We propose a novel Dynamic Diffusion-al Graph Neural Network (DVGNN) fortemporal forecasting.
The proposed DVGNN model outperforms state-of-the-art approaches and achieves outstanding Root Mean Squared Error result.
arXiv Detail & Related papers (2023-05-16T11:38:19Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Momentum Diminishes the Effect of Spectral Bias in Physics-Informed
Neural Networks [72.09574528342732]
Physics-informed neural network (PINN) algorithms have shown promising results in solving a wide range of problems involving partial differential equations (PDEs)
They often fail to converge to desirable solutions when the target function contains high-frequency features, due to a phenomenon known as spectral bias.
In the present work, we exploit neural tangent kernels (NTKs) to investigate the training dynamics of PINNs evolving under gradient descent with momentum (SGDM)
arXiv Detail & Related papers (2022-06-29T19:03:10Z) - How to train RNNs on chaotic data? [7.276372008305615]
Recurrent neural networks (RNNs) are wide-spread machine learning tools for modeling sequential and time series data.
They are notoriously hard to train because their loss gradients backpropagated in time tend to saturate or diverge during training.
Here we offer a comprehensive theoretical treatment of this problem by relating the loss gradients during RNN training to the Lyapunov spectrum of RNN-generated orbits.
arXiv Detail & Related papers (2021-10-14T09:07:42Z) - Recurrent Neural Networks for Partially Observed Dynamical Systems [0.0]
Delay embedding allows us to account for unobserved state variables.
We provide an approach to delay embedding that permits explicit approximation of error.
We also provide the dependence of the first order approximation error on the system size.
arXiv Detail & Related papers (2021-09-21T20:15:20Z) - UnICORNN: A recurrent model for learning very long time dependencies [0.0]
We propose a novel RNN architecture based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations.
The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem.
arXiv Detail & Related papers (2021-03-09T15:19:59Z) - Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z) - Implicit Bias of Linear RNNs [27.41989861342218]
Linear recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory.
This paper provides a rigorous explanation of this property in the special case of linear RNNs.
Using recently-developed kernel regime analysis, our main result shows that linear RNNs are functionally equivalent to a certain weighted 1D-convolutional network.
arXiv Detail & Related papers (2021-01-19T19:39:28Z) - Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models.
Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems.
These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z) - How Chaotic Are Recurrent Neural Networks? [22.236891108918396]
Recurrent neural networks (RNNs) are non-linear dynamic systems.
We show that a vanilla or long short term memory (LSTM) RNN does not exhibit chaotic behavior along the training process in real applications such as text generation.
arXiv Detail & Related papers (2020-04-28T21:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.