Related papers: Robustifying State-space Models for Long Sequences via Approximate Diagonalization

Robustifying State-space Models for Long Sequences via Approximate Diagonalization

URL: http://arxiv.org/abs/2310.01698v1
Date: Mon, 2 Oct 2023 23:36:13 GMT
Title: Robustifying State-space Models for Long Sequences via Approximate Diagonalization
Authors: Annan Yu, Arnur Nigmetov, Dmitriy Morozov, Michael W. Mahoney and N. Benjamin Erichson
Abstract summary: State-space models (SSMs) have emerged as a framework for learning long-range sequence tasks. diagonalizing the HiPPO framework is itself an ill-posed problem. We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology.
Score: 47.321212977509454
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State-space models (SSMs) have recently emerged as a framework for learning long-range sequence tasks. An example is the structured state-space sequence (S4) layer, which uses the diagonal-plus-low-rank structure of the HiPPO initialization framework. However, the complicated structure of the S4 layer poses challenges; and, in an effort to address these challenges, models such as S4D and S5 have considered a purely diagonal structure. This choice simplifies the implementation, improves computational efficiency, and allows channel communication. However, diagonalizing the HiPPO framework is itself an ill-posed problem. In this paper, we propose a general solution for this and related ill-posed diagonalization problems in machine learning. We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology, which is based on the pseudospectral theory of non-normal operators, and which may be interpreted as the approximate diagonalization of the non-normal matrices defining SSMs. Based on this, we introduce the S4-PTD and S5-PTD models. Through theoretical analysis of the transfer functions of different initialization schemes, we demonstrate that the S4-PTD/S5-PTD initialization strongly converges to the HiPPO framework, while the S4D/S5 initialization only achieves weak convergences. As a result, our new models show resilience to Fourier-mode noise-perturbed inputs, a crucial property not achieved by the S4D/S5 models. In addition to improved robustness, our S5-PTD model averages 87.6% accuracy on the Long-Range Arena benchmark, demonstrating that the PTD methodology helps to improve the accuracy of deep learning models.

Related papers

S4S: Solving for a Diffusion Model Solver [52.99341671532249]
Diffusion models (DMs) create samples from a data distribution by starting from random noise and solving a reverse-time ordinary differential equation (ODE) We propose a new method that learns a good solver for the DM, which we call Solving for the Solver (S4S) In all settings, S4S uniformly improves the sample quality relative to traditional ODE solvers.
arXiv Detail & Related papers (2025-02-24T18:55:54Z)
HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences. We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators. Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z)
Model Compression Method for S4 with Diagonal State Space Layers using Balanced Truncation [0.0]
We propose to use the balanced truncation, a prevalent model reduction technique in control theory, applied specifically to DSS layers in pre-trained S4 model as a novel model compression method. Numerical experiments demonstrate that our trained models combined with the balanced truncation surpass conventionally trained models with Skew-HiPPO.
arXiv Detail & Related papers (2024-02-25T05:22:45Z)
Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling. It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z)
Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE. Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4. We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z)
Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark. On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z)
Simplified State Space Layers for Sequence Modeling [11.215817688691194]
Recently, models using structured state space sequence layers achieved state-of-the-art performance on many long-range tasks. We revisit the idea that closely following the HiPPO framework is necessary for high performance. We replace the bank of many independent single-input, single-output (SISO) SSMs the S4 layer uses with one multi-input, multi-output (MIMO) SSM. S5 matches S4's performance on long-range tasks, including achieving an average of 82.46% on the suite of Long Range Arena benchmarks.
arXiv Detail & Related papers (2022-08-09T17:57:43Z)
Long Range Language Modeling via Gated State Spaces [67.64091993846269]
We focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles. We propose a new layer named Gated State Space (GSS) and show that it trains significantly faster than the diagonal version of S4.
arXiv Detail & Related papers (2022-06-27T01:50:18Z)
How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections [22.421814045703147]
Linear time-invariant state space models (SSM) have been shown to be very promising in machine learning. We introduce a theoretically rich class of SSMs that also lets us derive more intuitive S4 variants for other bases. These insights improve S4's performance to 86% on the Long Range Arena benchmark, with 96% on the most difficult Path-X task.
arXiv Detail & Related papers (2022-06-24T02:24:41Z)
On the Parameterization and Initialization of Diagonal State Space Models [35.68370606343843]
We show how to parameterize and initialize diagonal state space models. We show that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension.
arXiv Detail & Related papers (2022-06-23T17:58:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.