Combining Recurrent, Convolutional, and Continuous-time Models with
Linear State-Space Layers
- URL: http://arxiv.org/abs/2110.13985v1
- Date: Tue, 26 Oct 2021 19:44:53 GMT
- Title: Combining Recurrent, Convolutional, and Continuous-time Models with
Linear State-Space Layers
- Authors: Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra,
Christopher R\'e
- Abstract summary: We introduce a simple sequence model inspired by control systems that generalize.
We show that LSSL models are closely related to the three aforementioned families of models and inherit their strengths.
For example, they generalize convolutions to continuous-time, explain common RNN-1s, and share features of NDEs such as time-scale adaptation.
- Score: 21.09321438439848
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recurrent neural networks (RNNs), temporal convolutions, and neural
differential equations (NDEs) are popular families of deep learning models for
time-series data, each with unique strengths and tradeoffs in modeling power
and computational efficiency. We introduce a simple sequence model inspired by
control systems that generalizes these approaches while addressing their
shortcomings. The Linear State-Space Layer (LSSL) maps a sequence $u \mapsto y$
by simply simulating a linear continuous-time state-space representation
$\dot{x} = Ax + Bu, y = Cx + Du$. Theoretically, we show that LSSL models are
closely related to the three aforementioned families of models and inherit
their strengths. For example, they generalize convolutions to continuous-time,
explain common RNN heuristics, and share features of NDEs such as time-scale
adaptation. We then incorporate and generalize recent theory on continuous-time
memorization to introduce a trainable subset of structured matrices $A$ that
endow LSSLs with long-range memory. Empirically, stacking LSSL layers into a
simple deep neural network obtains state-of-the-art results across time series
benchmarks for long dependencies in sequential image classification, real-world
healthcare regression tasks, and speech. On a difficult speech classification
task with length-16000 sequences, LSSL outperforms prior approaches by 24
accuracy points, and even outperforms baselines that use hand-crafted features
on 100x shorter sequences.
Related papers
- Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens [14.424050371971354]
We introduce and study the bilinear sequence regression (BSR) as one of the most basic models for sequences of tokens.
We quantify the improvement that optimal learning brings with respect to vectorizing the sequence of tokens and learning via simple linear regression.
arXiv Detail & Related papers (2024-10-24T15:44:03Z) - Hierarchically Gated Recurrent Neural Network for Sequence Modeling [36.14544998133578]
We propose a gated linear RNN model dubbed Hierarchically Gated Recurrent Neural Network (HGRN)
Experiments on language modeling, image classification, and long-range arena benchmarks showcase the efficiency and effectiveness of our proposed model.
arXiv Detail & Related papers (2023-11-08T16:50:05Z) - DeLELSTM: Decomposition-based Linear Explainable LSTM to Capture
Instantaneous and Long-term Effects in Time Series [26.378073712630467]
We propose a Decomposition-based Linear Explainable LSTM (DeLELSTM) to improve the interpretability of LSTM.
We demonstrate the effectiveness and interpretability of DeLELSTM on three empirical datasets.
arXiv Detail & Related papers (2023-08-26T07:45:41Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE.
Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4.
We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z) - Classification of Long Sequential Data using Circular Dilated
Convolutional Neural Networks [10.014879130837912]
We propose a symmetric multi-scale architecture called Circular Dilated Convolutional Neural Network (CDIL-CNN)
Our model gives classification logits in all positions, and we can apply a simple ensemble learning to achieve a better decision.
arXiv Detail & Related papers (2022-01-06T16:58:59Z) - PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive
Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context.
We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z) - HiPPO: Recurrent Memory with Optimal Polynomial Projections [93.3537706398653]
We introduce a general framework (HiPPO) for the online compression of continuous signals and discrete time series by projection onto bases.
Given a measure that specifies the importance of each time step in the past, HiPPO produces an optimal solution to a natural online function approximation problem.
This formal framework yields a new memory update mechanism (HiPPO-LegS) that scales through time to remember all history, avoiding priors on the timescale.
arXiv Detail & Related papers (2020-08-17T23:39:33Z) - Liquid Time-constant Networks [117.57116214802504]
We introduce a new class of time-continuous recurrent neural network models.
Instead of declaring a learning system's dynamics by implicit nonlinearities, we construct networks of linear first-order dynamical systems.
These neural networks exhibit stable and bounded behavior, yield superior expressivity within the family of neural ordinary differential equations.
arXiv Detail & Related papers (2020-06-08T09:53:35Z) - Recognizing Long Grammatical Sequences Using Recurrent Networks
Augmented With An External Differentiable Stack [73.48927855855219]
Recurrent neural networks (RNNs) are a widely used deep architecture for sequence modeling, generation, and prediction.
RNNs generalize poorly over very long sequences, which limits their applicability to many important temporal processing and time series forecasting problems.
One way to address these shortcomings is to couple an RNN with an external, differentiable memory structure, such as a stack.
In this paper, we improve the memory-augmented RNN with important architectural and state updating mechanisms.
arXiv Detail & Related papers (2020-04-04T14:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.