Related papers: Diagonal State Spaces are as Effective as Structured State Spaces

Diagonal State Spaces are as Effective as Structured State Spaces

URL: http://arxiv.org/abs/2203.14343v1
Date: Sun, 27 Mar 2022 16:30:33 GMT
Title: Diagonal State Spaces are as Effective as Structured State Spaces
Authors: Ankit Gupta
Abstract summary: We show that our $textitDiagonal State Space$ (DSS) model matches the performance of S4 on Long Range Arena tasks, speech classification on Speech Commands dataset, while being conceptually simpler and straightforward to implement. In this work, we show that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal.
Score: 3.8276199743296906
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Modeling long range dependencies in sequential data is a fundamental step towards attaining human-level performance in many modalities such as text, vision and audio. While attention-based models are a popular and effective choice in modeling short-range interactions, their performance on tasks requiring long range reasoning has been largely inadequate. In a breakthrough result, Gu et al. (2022) proposed the $\textit{Structured State Space}$ (S4) architecture delivering large gains over state-of-the-art models on several long-range tasks across various modalities. The core proposition of S4 is the parameterization of state matrices via a diagonal plus low rank structure, allowing efficient computation. In this work, we show that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal. Our $\textit{Diagonal State Space}$ (DSS) model matches the performance of S4 on Long Range Arena tasks, speech classification on Speech Commands dataset, while being conceptually simpler and straightforward to implement.

Related papers

State-Free Inference of State-Space Models: The Transfer Function Approach [132.83348321603205]
State-free inference does not incur any significant memory or computational cost with an increase in state size. We achieve this using properties of the proposed frequency domain transfer function parametrization. We report improved perplexity in language modeling over a long convolutional Hyena baseline.
arXiv Detail & Related papers (2024-05-10T00:06:02Z)
Robustifying State-space Models for Long Sequences via Approximate Diagonalization [47.321212977509454]
State-space models (SSMs) have emerged as a framework for learning long-range sequence tasks. diagonalizing the HiPPO framework is itself an ill-posed problem. We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology.
arXiv Detail & Related papers (2023-10-02T23:36:13Z)
Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE. Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4. We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z)
Efficient Long Sequence Modeling via State Space Augmented Transformer [92.74707853711374]
We propose SPADE, short for $underlinetextbfS$tate sunderlinetextbfP$ace. We augment a SSM into the bottom layer of SPADE, and we employ efficient local attention methods for the other layers. Experimental results on the Long Range Arena benchmark and language modeling tasks demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2022-12-15T20:51:27Z)
Simplifying and Understanding State Space Models with Diagonal Linear RNNs [56.33053691749856]
This work disposes of the discretization step, and proposes a model based on vanilla Diagonal Linear RNNs. We empirically show that, despite being conceptually much simpler, $mathrmDLR$ is as performant as previously-proposed SSMs. We also characterize the expressivity of SSMs and attention-based models via a suite of $13$ synthetic sequence-to-sequence tasks.
arXiv Detail & Related papers (2022-12-01T18:53:06Z)
What Makes Convolutional Models Great on Long Sequence Modeling? [30.50800981442449]
We focus on the structure of the convolution kernel and identify two critical but intuitive principles. We propose a simple yet effective convolutional model called Structured Global Convolution (SGConv)
arXiv Detail & Related papers (2022-10-17T17:53:29Z)
Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark. On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z)
Long Range Language Modeling via Gated State Spaces [67.64091993846269]
We focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. We propose a new layer named Gated State Space (GSS) and show that it trains significantly faster than the diagonal version of S4.
arXiv Detail & Related papers (2022-06-27T01:50:18Z)
On the Parameterization and Initialization of Diagonal State Space Models [35.68370606343843]
We show how to parameterize and initialize diagonal state space models. We show that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension.
arXiv Detail & Related papers (2022-06-23T17:58:39Z)
Efficiently Modeling Long Sequences with Structured State Spaces [15.456254157293836]
We propose a new sequence model based on a new parameterization for the fundamental state space model. S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet.
arXiv Detail & Related papers (2021-10-31T03:32:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.