Robustifying State-space Models for Long Sequences via Approximate
Diagonalization
- URL: http://arxiv.org/abs/2310.01698v1
- Date: Mon, 2 Oct 2023 23:36:13 GMT
- Title: Robustifying State-space Models for Long Sequences via Approximate
Diagonalization
- Authors: Annan Yu, Arnur Nigmetov, Dmitriy Morozov, Michael W. Mahoney and N.
Benjamin Erichson
- Abstract summary: State-space models (SSMs) have emerged as a framework for learning long-range sequence tasks.
diagonalizing the HiPPO framework is itself an ill-posed problem.
We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology.
- Score: 47.321212977509454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: State-space models (SSMs) have recently emerged as a framework for learning
long-range sequence tasks. An example is the structured state-space sequence
(S4) layer, which uses the diagonal-plus-low-rank structure of the HiPPO
initialization framework. However, the complicated structure of the S4 layer
poses challenges; and, in an effort to address these challenges, models such as
S4D and S5 have considered a purely diagonal structure. This choice simplifies
the implementation, improves computational efficiency, and allows channel
communication. However, diagonalizing the HiPPO framework is itself an
ill-posed problem. In this paper, we propose a general solution for this and
related ill-posed diagonalization problems in machine learning. We introduce a
generic, backward-stable "perturb-then-diagonalize" (PTD) methodology, which is
based on the pseudospectral theory of non-normal operators, and which may be
interpreted as the approximate diagonalization of the non-normal matrices
defining SSMs. Based on this, we introduce the S4-PTD and S5-PTD models.
Through theoretical analysis of the transfer functions of different
initialization schemes, we demonstrate that the S4-PTD/S5-PTD initialization
strongly converges to the HiPPO framework, while the S4D/S5 initialization only
achieves weak convergences. As a result, our new models show resilience to
Fourier-mode noise-perturbed inputs, a crucial property not achieved by the
S4D/S5 models. In addition to improved robustness, our S5-PTD model averages
87.6% accuracy on the Long-Range Arena benchmark, demonstrating that the PTD
methodology helps to improve the accuracy of deep learning models.
Related papers
- HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences.
We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators.
Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z) - Model Compression Method for S4 with Diagonal State Space Layers using Balanced Truncation [0.0]
We propose to use the balanced truncation, a prevalent model reduction technique in control theory, applied specifically to DSS layers in pre-trained S4 model as a novel model compression method.
Numerical experiments demonstrate that our trained models combined with the balanced truncation surpass conventionally trained models with Skew-HiPPO.
arXiv Detail & Related papers (2024-02-25T05:22:45Z) - Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling.
It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z) - Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE.
Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4.
We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z) - Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark.
On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z) - Simplified State Space Layers for Sequence Modeling [11.215817688691194]
Recently, models using structured state space sequence layers achieved state-of-the-art performance on many long-range tasks.
We revisit the idea that closely following the HiPPO framework is necessary for high performance.
We replace the bank of many independent single-input, single-output (SISO) SSMs the S4 layer uses with one multi-input, multi-output (MIMO) SSM.
S5 matches S4's performance on long-range tasks, including achieving an average of 82.46% on the suite of Long Range Arena benchmarks.
arXiv Detail & Related papers (2022-08-09T17:57:43Z) - Long Range Language Modeling via Gated State Spaces [67.64091993846269]
We focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles.
We propose a new layer named Gated State Space (GSS) and show that it trains significantly faster than the diagonal version of S4.
arXiv Detail & Related papers (2022-06-27T01:50:18Z) - How to Train Your HiPPO: State Space Models with Generalized Orthogonal
Basis Projections [22.421814045703147]
Linear time-invariant state space models (SSM) have been shown to be very promising in machine learning.
We introduce a theoretically rich class of SSMs that also lets us derive more intuitive S4 variants for other bases.
These insights improve S4's performance to 86% on the Long Range Arena benchmark, with 96% on the most difficult Path-X task.
arXiv Detail & Related papers (2022-06-24T02:24:41Z) - On the Parameterization and Initialization of Diagonal State Space
Models [35.68370606343843]
We show how to parameterize and initialize diagonal state space models.
We show that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension.
arXiv Detail & Related papers (2022-06-23T17:58:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.