Resurrecting Recurrent Neural Networks for Long Sequences
- URL: http://arxiv.org/abs/2303.06349v1
- Date: Sat, 11 Mar 2023 08:53:11 GMT
- Title: Resurrecting Recurrent Neural Networks for Long Sequences
- Authors: Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar
Gulcehre, Razvan Pascanu and Soham De
- Abstract summary: Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train.
Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks.
We show that careful design of deep RNNs using standard signal propagation arguments can recover the impressive performance of deep SSMs on long-range reasoning tasks.
- Score: 45.800920421868625
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recurrent Neural Networks (RNNs) offer fast inference on long sequences but
are hard to optimize and slow to train. Deep state-space models (SSMs) have
recently been shown to perform remarkably well on long sequence modeling tasks,
and have the added benefits of fast parallelizable training and RNN-like fast
inference. However, while SSMs are superficially similar to RNNs, there are
important differences that make it unclear where their performance boost over
RNNs comes from. In this paper, we show that careful design of deep RNNs using
standard signal propagation arguments can recover the impressive performance of
deep SSMs on long-range reasoning tasks, while also matching their training
speed. To achieve this, we analyze and ablate a series of changes to standard
RNNs including linearizing and diagonalizing the recurrence, using better
parameterizations and initializations, and ensuring proper normalization of the
forward pass. Our results provide new insights on the origins of the impressive
performance of deep SSMs, while also introducing an RNN block called the Linear
Recurrent Unit that matches both their performance on the Long Range Arena
benchmark and their computational efficiency.
Related papers
- PRF: Parallel Resonate and Fire Neuron for Long Sequence Learning in Spiking Neural Networks [6.545474731089018]
We address the efficiency and performance challenges of long sequence learning in Spiking Neural Networks (SNNs) simultaneously.
First, we propose a decoupled reset method for parallel spiking neuron training, reducing the typical Leaky Integrate-and-Fire (LIF) model's training time from $O(L2)$ to $O(Llog L)$.
Secondly, to capture long-range dependencies, we propose a Parallel Resonate and Fire (PRF) neuron, which leverages an oscillating membrane potential driven by a resonate mechanism from a differentiable reset function in the complex domain
arXiv Detail & Related papers (2024-10-04T15:51:56Z) - Were RNNs All We Needed? [53.393497486332]
We revisit traditional recurrent neural networks (RNNs) from over a decade ago.
We show that by removing their hidden state dependencies from their input, forget, and update gates, LSTMs and GRUs no longer need to BPTT and can be efficiently trained in parallel.
arXiv Detail & Related papers (2024-10-02T03:06:49Z) - Learning Long Sequences in Spiking Neural Networks [0.0]
Spiking neural networks (SNNs) take inspiration from the brain to enable energy-efficient computations.
Recent interest in efficient alternatives to Transformers has given rise to state-of-the-art recurrent architectures named state space models (SSMs)
arXiv Detail & Related papers (2023-12-14T13:30:27Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Comparative Analysis of Interval Reachability for Robust Implicit and
Feedforward Neural Networks [64.23331120621118]
We use interval reachability analysis to obtain robustness guarantees for implicit neural networks (INNs)
INNs are a class of implicit learning models that use implicit equations as layers.
We show that our approach performs at least as well as, and generally better than, applying state-of-the-art interval bound propagation methods to INNs.
arXiv Detail & Related papers (2022-04-01T03:31:27Z) - UnICORNN: A recurrent model for learning very long time dependencies [0.0]
We propose a novel RNN architecture based on a structure preserving discretization of a Hamiltonian system of second-order ordinary differential equations.
The resulting RNN is fast, invertible (in time), memory efficient and we derive rigorous bounds on the hidden state gradients to prove the mitigation of the exploding and vanishing gradient problem.
arXiv Detail & Related papers (2021-03-09T15:19:59Z) - Deep Time Delay Neural Network for Speech Enhancement with Full Data
Learning [60.20150317299749]
This paper proposes a deep time delay neural network (TDNN) for speech enhancement with full data learning.
To make full use of the training data, we propose a full data learning method for speech enhancement.
arXiv Detail & Related papers (2020-11-11T06:32:37Z) - SRDCNN: Strongly Regularized Deep Convolution Neural Network
Architecture for Time-series Sensor Signal Classification Tasks [4.950427992960756]
We present SRDCNN: Strongly Regularized Deep Convolution Neural Network (DCNN) based deep architecture to perform time series classification tasks.
The novelty of the proposed approach is that the network weights are regularized by both L1 and L2 norm penalties.
arXiv Detail & Related papers (2020-07-14T08:42:39Z) - Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs)
We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs.
We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z) - Achieving Online Regression Performance of LSTMs with Simple RNNs [0.0]
We introduce a first-order training algorithm with a linear time complexity in the number of parameters.
We show that when SRNNs are trained with our algorithm, they provide very similar regression performance with the LSTMs in two to three times shorter training time.
arXiv Detail & Related papers (2020-05-16T11:41:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.