Gating creates slow modes and controls phase-space complexity in GRUs
and LSTMs
- URL: http://arxiv.org/abs/2002.00025v2
- Date: Mon, 15 Jun 2020 23:14:05 GMT
- Title: Gating creates slow modes and controls phase-space complexity in GRUs
and LSTMs
- Authors: Tankut Can, Kamesh Krishnamurthy, David J. Schwab
- Abstract summary: We study how the addition of gates influences the dynamics and trainability of GRUs and LSTMs.
We show that the update gate in the GRU and the forget gate in the LSTM can lead to an accumulation of slow modes in the dynamics.
- Score: 5.672132510411465
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural networks (RNNs) are powerful dynamical models for data with
complex temporal structure. However, training RNNs has traditionally proved
challenging due to exploding or vanishing of gradients. RNN models such as
LSTMs and GRUs (and their variants) significantly mitigate these issues
associated with training by introducing various types of gating units into the
architecture. While these gates empirically improve performance, how the
addition of gates influences the dynamics and trainability of GRUs and LSTMs is
not well understood. Here, we take the perspective of studying randomly
initialized LSTMs and GRUs as dynamical systems, and ask how the salient
dynamical properties are shaped by the gates. We leverage tools from random
matrix theory and mean-field theory to study the state-to-state Jacobians of
GRUs and LSTMs. We show that the update gate in the GRU and the forget gate in
the LSTM can lead to an accumulation of slow modes in the dynamics. Moreover,
the GRU update gate can poise the system at a marginally stable point. The
reset gate in the GRU and the output and input gates in the LSTM control the
spectral radius of the Jacobian, and the GRU reset gate also modulates the
complexity of the landscape of fixed-points. Furthermore, for the GRU we obtain
a phase diagram describing the statistical properties of fixed-points. We also
provide a preliminary comparison of training performance to the various
dynamical regimes realized by varying hyperparameters. Looking to the future,
we have introduced a powerful set of techniques which can be adapted to a broad
class of RNNs, to study the influence of various architectural choices on
dynamics, and potentially motivate the principled discovery of novel
architectures.
Related papers
- Deconstructing Recurrence, Attention, and Gating: Investigating the transferability of Transformers and Gated Recurrent Neural Networks in forecasting of dynamical systems [0.0]
We decompose the key architectural components of the most powerful neural architectures, namely gating and recurrence in RNNs, and attention mechanisms in transformers.
A key finding is that neural gating and attention improves the accuracy of all standard RNNs in most tasks, while the addition of a notion of recurrence in transformers is detrimental.
arXiv Detail & Related papers (2024-10-03T16:41:51Z) - Universal In-Context Approximation By Prompting Fully Recurrent Models [86.61942787684272]
We show that RNNs, LSTMs, GRUs, Linear RNNs, and linear gated architectures can serve as universal in-context approximators.
We introduce a programming language called LSRL that compiles to fully recurrent architectures.
arXiv Detail & Related papers (2024-06-03T15:25:13Z) - Theoretical Foundations of Deep Selective State-Space Models [13.971499161967083]
Deep SSMs demonstrate outstanding performance across a diverse set of domains.
Recent developments show that if the linear recurrence powering SSMs allows for multiplicative interactions between inputs and hidden states.
We show that when random linear recurrences are equipped with simple input-controlled transitions, then the hidden state is provably a low-dimensional projection of a powerful mathematical object.
arXiv Detail & Related papers (2024-02-29T11:20:16Z) - Enhancing Automatic Modulation Recognition through Robust Global Feature
Extraction [12.868218616042292]
Modulated signals exhibit long temporal dependencies.
Human experts analyze patterns in constellation diagrams to classify modulation schemes.
Classical convolutional-based networks excel at extracting local features but struggle to capture global relationships.
arXiv Detail & Related papers (2024-01-02T06:31:24Z) - Disentangling Structured Components: Towards Adaptive, Interpretable and
Scalable Time Series Forecasting [52.47493322446537]
We develop a adaptive, interpretable and scalable forecasting framework, which seeks to individually model each component of the spatial-temporal patterns.
SCNN works with a pre-defined generative process of MTS, which arithmetically characterizes the latent structure of the spatial-temporal patterns.
Extensive experiments are conducted to demonstrate that SCNN can achieve superior performance over state-of-the-art models on three real-world datasets.
arXiv Detail & Related papers (2023-05-22T13:39:44Z) - ConCerNet: A Contrastive Learning Based Framework for Automated
Conservation Law Discovery and Trustworthy Dynamical System Prediction [82.81767856234956]
This paper proposes a new learning framework named ConCerNet to improve the trustworthiness of the DNN based dynamics modeling.
We show that our method consistently outperforms the baseline neural networks in both coordinate error and conservation metrics.
arXiv Detail & Related papers (2023-02-11T21:07:30Z) - Reverse engineering recurrent neural networks with Jacobian switching
linear dynamical systems [24.0378100479104]
Recurrent neural networks (RNNs) are powerful models for processing time-series data.
The framework of reverse engineering a trained RNN by linearizing around its fixed points has provided insight, but the approach has significant challenges.
We present a new model that overcomes these limitations by co-training an RNN with a novel switching linear dynamical system (SLDS) formulation.
arXiv Detail & Related papers (2021-11-01T20:49:30Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - An Ode to an ODE [78.97367880223254]
We present a new paradigm for Neural ODE algorithms, called ODEtoODE, where time-dependent parameters of the main flow evolve according to a matrix flow on the group O(d)
This nested system of two flows provides stability and effectiveness of training and provably solves the gradient vanishing-explosion problem.
arXiv Detail & Related papers (2020-06-19T22:05:19Z) - Refined Gate: A Simple and Effective Gating Mechanism for Recurrent
Units [68.30422112784355]
We propose a new gating mechanism within general gated recurrent neural networks to handle this issue.
The proposed gates directly short connect the extracted input features to the outputs of vanilla gates.
We verify the proposed gating mechanism on three popular types of gated RNNs including LSTM, GRU and MGU.
arXiv Detail & Related papers (2020-02-26T07:51:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.