Related papers: On Lyapunov Exponents for RNNs: Understanding Information Propagation Using Dynamical Systems Tools

On Lyapunov Exponents for RNNs: Understanding Information Propagation Using Dynamical Systems Tools

URL: http://arxiv.org/abs/2006.14123v1
Date: Thu, 25 Jun 2020 00:53:19 GMT
Title: On Lyapunov Exponents for RNNs: Understanding Information Propagation Using Dynamical Systems Tools
Authors: Ryan Vogt, Maximilian Puelma Touzel, Eli Shlizerman, Guillaume Lajoie
Abstract summary: Lyapunov Exponents (LEs) measure the rates of expansion and contraction of nonlinear system trajectories. Les have a bearing on stability of RNN training dynamics because forward propagation of information is related to the backward propagation of error gradients. As a tool to understand and exploit stability of training dynamics, the Lyapunov spectrum fills an existing gap between prescriptive mathematical approaches.
Score: 6.199300239433395
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recurrent neural networks (RNNs) have been successfully applied to a variety of problems involving sequential data, but their optimization is sensitive to parameter initialization, architecture, and optimizer hyperparameters. Considering RNNs as dynamical systems, a natural way to capture stability, i.e., the growth and decay over long iterates, are the Lyapunov Exponents (LEs), which form the Lyapunov spectrum. The LEs have a bearing on stability of RNN training dynamics because forward propagation of information is related to the backward propagation of error gradients. LEs measure the asymptotic rates of expansion and contraction of nonlinear system trajectories, and generalize stability analysis to the time-varying attractors structuring the non-autonomous dynamics of data-driven RNNs. As a tool to understand and exploit stability of training dynamics, the Lyapunov spectrum fills an existing gap between prescriptive mathematical approaches of limited scope and computationally-expensive empirical approaches. To leverage this tool, we implement an efficient way to compute LEs for RNNs during training, discuss the aspects specific to standard RNN architectures driven by typical sequential datasets, and show that the Lyapunov spectrum can serve as a robust readout of training stability across hyperparameters. With this exposition-oriented contribution, we hope to draw attention to this understudied, but theoretically grounded tool for understanding training stability in RNNs.

Related papers

Generative System Dynamics in Recurrent Neural Networks [56.958984970518564]
We investigate the continuous time dynamics of Recurrent Neural Networks (RNNs) We show that skew-symmetric weight matrices are fundamental to enable stable limit cycles in both linear and nonlinear configurations. Numerical simulations showcase how nonlinear activation functions not only maintain limit cycles, but also enhance the numerical stability of the system integration process.
arXiv Detail & Related papers (2025-04-16T10:39:43Z)
Analysing Rescaling, Discretization, and Linearization in RNNs for Neural System Modelling [0.0]
Recurrent Neural Networks (RNNs) are widely used for modelling neural activity, yet the mathematical interplay of core procedures is uncharacterized. This study establishes the conditions under which these procedures commute, enabling flexible application in computational neuroscience. Our findings directly guide the design of biologically plausible RNNs for simulating neural dynamics in decision-making and motor control.
arXiv Detail & Related papers (2023-12-26T10:00:33Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Stability and Generalization Analysis of Gradient Methods for Shallow Neural Networks [59.142826407441106]
We study the generalization behavior of shallow neural networks (SNNs) by leveraging the concept of algorithmic stability. We consider gradient descent (GD) and gradient descent (SGD) to train SNNs, for both of which we develop consistent excess bounds.
arXiv Detail & Related papers (2022-09-19T18:48:00Z)
Tractable Dendritic RNNs for Reconstructing Nonlinear Dynamical Systems [7.045072177165241]
We augment a piecewise-linear recurrent neural network (RNN) by a linear spline basis expansion. We show that this approach retains all the theoretically appealing properties of the simple PLRNN, yet boosts its capacity for approximating arbitrary nonlinear dynamical systems in comparatively low dimensions.
arXiv Detail & Related papers (2022-07-06T09:43:03Z)
On the Intrinsic Structures of Spiking Neural Networks [66.57589494713515]
Recent years have emerged a surge of interest in SNNs owing to their remarkable potential to handle time-dependent and event-driven data. There has been a dearth of comprehensive studies examining the impact of intrinsic structures within spiking computations. This work delves deep into the intrinsic structures of SNNs, by elucidating their influence on the expressivity of SNNs.
arXiv Detail & Related papers (2022-06-21T09:42:30Z)
Lyapunov-Guided Representation of Recurrent Neural Network Performance [9.449520199858952]
Recurrent Neural Networks (RNN) are ubiquitous computing systems for sequences and time series data. We propose to treat RNN as dynamical systems and to correlate hyperparameters with accuracy through Lyapunov spectral analysis. Our studies of various RNN architectures show that AeLLE successfully correlates RNN Lyapunov spectrum with accuracy.
arXiv Detail & Related papers (2022-04-11T05:38:38Z)
Comparative Analysis of Interval Reachability for Robust Implicit and Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs) INNs are a class of implicit learning models that use implicit equations as layers. We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z)
Extended critical regimes of deep neural networks [0.0]
We show that heavy-tailed weights enable the emergence of an extended critical regime without fine-tuning parameters. In this extended critical regime, DNNs exhibit rich and complex propagation dynamics across layers. We provide a theoretical guide for the design of efficient neural architectures.
arXiv Detail & Related papers (2022-03-24T10:15:50Z)
Reverse engineering recurrent neural networks with Jacobian switching linear dynamical systems [24.0378100479104]
Recurrent neural networks (RNNs) are powerful models for processing time-series data. The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has significant challenges. We present a new model that overcomes these limitations by co-training an RNN with a novel switching linear dynamical system (SLDS) formulation.
arXiv Detail & Related papers (2021-11-01T20:49:30Z)
Coupled Oscillatory Recurrent Neural Network (coRNN): An accurate and (gradient) stable architecture for learning long time dependencies [15.2292571922932]
We propose a novel architecture for recurrent neural networks. Our proposed RNN is based on a time-discretization of a system of second-order ordinary differential equations. Experiments show that the proposed RNN is comparable in performance to the state of the art on a variety of benchmarks.
arXiv Detail & Related papers (2020-10-02T12:35:04Z)
Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs) We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z)
Lipschitz Recurrent Neural Networks [100.72827570987992]
We show that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs. Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks.
arXiv Detail & Related papers (2020-06-22T08:44:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.