Simplified State Space Layers for Sequence Modeling
- URL: http://arxiv.org/abs/2208.04933v1
- Date: Tue, 9 Aug 2022 17:57:43 GMT
- Title: Simplified State Space Layers for Sequence Modeling
- Authors: Jimmy T.H. Smith, Andrew Warrington, and Scott W. Linderman
- Abstract summary: Recently, models using structured state space sequence layers achieved state-of-the-art performance on many long-range tasks.
We revisit the idea that closely following the HiPPO framework is necessary for high performance.
We replace the bank of many independent single-input, single-output (SISO) SSMs the S4 layer uses with one multi-input, multi-output (MIMO) SSM.
S5 matches S4's performance on long-range tasks, including achieving an average of 82.46% on the suite of Long Range Arena benchmarks.
- Score: 11.215817688691194
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficiently modeling long-range dependencies is an important goal in sequence
modeling. Recently, models using structured state space sequence (S4) layers
achieved state-of-the-art performance on many long-range tasks. The S4 layer
combines linear state space models (SSMs) with deep learning techniques and
leverages the HiPPO framework for online function approximation to achieve high
performance. However, this framework led to architectural constraints and
computational difficulties that make the S4 approach complicated to understand
and implement. We revisit the idea that closely following the HiPPO framework
is necessary for high performance. Specifically, we replace the bank of many
independent single-input, single-output (SISO) SSMs the S4 layer uses with one
multi-input, multi-output (MIMO) SSM with a reduced latent dimension. The
reduced latent dimension of the MIMO system allows for the use of efficient
parallel scans which simplify the computations required to apply the S5 layer
as a sequence-to-sequence transformation. In addition, we initialize the state
matrix of the S5 SSM with an approximation to the HiPPO-LegS matrix used by
S4's SSMs and show that this serves as an effective initialization for the MIMO
setting. S5 matches S4's performance on long-range tasks, including achieving
an average of 82.46% on the suite of Long Range Arena benchmarks compared to
S4's 80.48% and the best transformer variant's 61.41%.
Related papers
- Provable Benefits of Complex Parameterizations for Structured State Space Models [51.90574950170374]
Structured state space models (SSMs) are linear dynamical systems adhering to a specified structure.
In contrast to typical neural network modules, whose parameterizations are real, SSMs often use complex parameterizations.
This paper takes a step towards explaining the benefits of complex parameterizations for SSMs by establishing formal gaps between real and complex diagonal SSMs.
arXiv Detail & Related papers (2024-10-17T22:35:50Z) - Mamba-based Light Field Super-Resolution with Efficient Subspace Scanning [48.99361249764921]
Transformer-based methods have demonstrated impressive performance in 4D light field (LF) super-resolution.
However, their quadratic complexity hinders the efficient processing of high resolution 4D inputs.
We propose a Mamba-based Light Field Super-Resolution method, named MLFSR, by designing an efficient subspace scanning strategy.
arXiv Detail & Related papers (2024-06-23T11:28:08Z) - HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences.
We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators.
Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z) - Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model [0.0]
A recently developed model, the Structured State Space (S4), demonstrated significant effectiveness in modeling long-range sequences.
We propose exponential smoothing (ETS) to augment and reduce inductive bias.
Our models achieve comparable results to S4 on the LRA benchmark.
arXiv Detail & Related papers (2024-03-26T07:23:46Z) - Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling.
It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z) - Robustifying State-space Models for Long Sequences via Approximate
Diagonalization [47.321212977509454]
State-space models (SSMs) have emerged as a framework for learning long-range sequence tasks.
diagonalizing the HiPPO framework is itself an ill-posed problem.
We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology.
arXiv Detail & Related papers (2023-10-02T23:36:13Z) - A Neural State-Space Model Approach to Efficient Speech Separation [34.38911304755453]
We introduce S4M, a new efficient speech separation framework based on neural state-space models (SSM)
To extend the SSM technique into speech separation tasks, we first decompose the input mixture into multi-scale representations with different resolutions.
Experiments show that S4M performs comparably to other separation backbones in terms of SI-SDRi.
Our S4M-tiny model (1.8M parameters) even surpasses attention-based Sepformer (26.0M parameters) in noisy conditions with only 9.2 of multiply-accumulate operation (MACs)
arXiv Detail & Related papers (2023-05-26T13:47:11Z) - Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark.
On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z) - How to Train Your HiPPO: State Space Models with Generalized Orthogonal
Basis Projections [22.421814045703147]
Linear time-invariant state space models (SSM) have been shown to be very promising in machine learning.
We introduce a theoretically rich class of SSMs that also lets us derive more intuitive S4 variants for other bases.
These insights improve S4's performance to 86% on the Long Range Arena benchmark, with 96% on the most difficult Path-X task.
arXiv Detail & Related papers (2022-06-24T02:24:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.