RotRNN: Modelling Long Sequences with Rotations
- URL: http://arxiv.org/abs/2407.07239v2
- Date: Sun, 6 Oct 2024 08:44:42 GMT
- Title: RotRNN: Modelling Long Sequences with Rotations
- Authors: Kai Biegun, Rares Dolga, Jake Cunningham, David Barber,
- Abstract summary: Linear recurrent neural networks, such as State Space Models (SSMs) and Linear Recurrent Units (LRUs) have recently shown state-of-the-art performance on long sequence modelling benchmarks.
We propose RotRNN -- a linear recurrent model which utilises the convenient properties of rotation matrices.
We show that RotRNN provides a simple and efficient model with a robust normalisation procedure, and a practical implementation that remains faithful to its theoretical derivation.
- Score: 7.037239398244858
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Linear recurrent neural networks, such as State Space Models (SSMs) and Linear Recurrent Units (LRUs), have recently shown state-of-the-art performance on long sequence modelling benchmarks. Despite their success, their empirical performance is not well understood and they come with a number of drawbacks, most notably their complex initialisation and normalisation schemes. In this work, we address some of these issues by proposing RotRNN -- a linear recurrent model which utilises the convenient properties of rotation matrices. We show that RotRNN provides a simple and efficient model with a robust normalisation procedure, and a practical implementation that remains faithful to its theoretical derivation. RotRNN also achieves competitive performance to state-of-the-art linear recurrent models on several long sequence modelling datasets.
Related papers
- Hierarchically Gated Recurrent Neural Network for Sequence Modeling [36.14544998133578]
We propose a gated linear RNN model dubbed Hierarchically Gated Recurrent Neural Network (HGRN)
Experiments on language modeling, image classification, and long-range arena benchmarks showcase the efficiency and effectiveness of our proposed model.
arXiv Detail & Related papers (2023-11-08T16:50:05Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Resurrecting Recurrent Neural Networks for Long Sequences [45.800920421868625]
Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train.
Deep state-space models (SSMs) have recently been shown to perform remarkably well on long sequence modeling tasks.
We show that careful design of deep RNNs using standard signal propagation arguments can recover the impressive performance of deep SSMs on long-range reasoning tasks.
arXiv Detail & Related papers (2023-03-11T08:53:11Z) - Improved Batching Strategy For Irregular Time-Series ODE [0.0]
We propose an improvement in the runtime on ODE-RNNs by using a different efficient strategy.
Our experiments show that the new models reduce the runtime of ODE-RNN significantly ranging from 2 times up to 49 times depending on the irregularity of the data.
arXiv Detail & Related papers (2022-07-12T17:30:02Z) - A Comparative Study of Detecting Anomalies in Time Series Data Using
LSTM and TCN Models [2.007262412327553]
This paper compares two prominent deep learning modeling techniques.
The Recurrent Neural Network (RNN)-based Long Short-Term Memory (LSTM) and the convolutional Neural Network (CNN)-based Temporal Convolutional Networks (TCN) are compared.
arXiv Detail & Related papers (2021-12-17T02:46:55Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - Recurrent Graph Tensor Networks: A Low-Complexity Framework for
Modelling High-Dimensional Multi-Way Sequence [24.594587557319837]
We develop a graph filter framework for approximating the modelling of hidden states in Recurrent Neural Networks (RNNs)
The proposed framework is validated through several multi-way sequence modelling tasks and benchmarked against traditional RNNs.
We show that the proposed RGTN is capable of not only out-performing standard RNNs, but also mitigating the Curse of Dimensionality associated with traditional RNNs.
arXiv Detail & Related papers (2020-09-18T10:13:36Z) - Lipschitz Recurrent Neural Networks [100.72827570987992]
We show that our Lipschitz recurrent unit is more robust with respect to input and parameter perturbations as compared to other continuous-time RNNs.
Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks.
arXiv Detail & Related papers (2020-06-22T08:44:52Z) - Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models.
Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems.
These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.