How to Train Your HiPPO: State Space Models with Generalized Orthogonal
Basis Projections
- URL: http://arxiv.org/abs/2206.12037v1
- Date: Fri, 24 Jun 2022 02:24:41 GMT
- Title: How to Train Your HiPPO: State Space Models with Generalized Orthogonal
Basis Projections
- Authors: Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, Christopher R\'e
- Abstract summary: Linear time-invariant state space models (SSM) have been shown to be very promising in machine learning.
We introduce a theoretically rich class of SSMs that also lets us derive more intuitive S4 variants for other bases.
These insights improve S4's performance to 86% on the Long Range Arena benchmark, with 96% on the most difficult Path-X task.
- Score: 22.421814045703147
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Linear time-invariant state space models (SSM) are a classical model from
engineering and statistics, that have recently been shown to be very promising
in machine learning through the Structured State Space sequence model (S4). A
core component of S4 involves initializing the SSM state matrix to a particular
matrix called a HiPPO matrix, which was empirically important for S4's ability
to handle long sequences. However, the specific matrix that S4 uses was
actually derived in previous work for a particular time-varying dynamical
system, and the use of this matrix as a time-invariant SSM had no known
mathematical interpretation. Consequently, the theoretical mechanism by which
S4 models long-range dependencies actually remains unexplained. We derive a
more general and intuitive formulation of the HiPPO framework, which provides a
simple mathematical interpretation of S4 as a decomposition onto
exponentially-warped Legendre polynomials, explaining its ability to capture
long dependencies. Our generalization introduces a theoretically rich class of
SSMs that also lets us derive more intuitive S4 variants for other bases such
as the Fourier basis, and explains other aspects of training S4, such as how to
initialize the important timescale parameter. These insights improve S4's
performance to 86% on the Long Range Arena benchmark, with 96% on the most
difficult Path-X task.
Related papers
- Provable Benefits of Complex Parameterizations for Structured State Space Models [51.90574950170374]
Structured state space models (SSMs) are linear dynamical systems adhering to a specified structure.
In contrast to typical neural network modules, whose parameterizations are real, SSMs often use complex parameterizations.
This paper takes a step towards explaining the benefits of complex parameterizations for SSMs by establishing formal gaps between real and complex diagonal SSMs.
arXiv Detail & Related papers (2024-10-17T22:35:50Z) - HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences.
We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators.
Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z) - Incorporating Exponential Smoothing into MLP: A Simple but Effective Sequence Model [0.0]
A recently developed model, the Structured State Space (S4), demonstrated significant effectiveness in modeling long-range sequences.
We propose exponential smoothing (ETS) to augment and reduce inductive bias.
Our models achieve comparable results to S4 on the LRA benchmark.
arXiv Detail & Related papers (2024-03-26T07:23:46Z) - Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling.
It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z) - Robustifying State-space Models for Long Sequences via Approximate
Diagonalization [47.321212977509454]
State-space models (SSMs) have emerged as a framework for learning long-range sequence tasks.
diagonalizing the HiPPO framework is itself an ill-posed problem.
We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology.
arXiv Detail & Related papers (2023-10-02T23:36:13Z) - Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE.
Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4.
We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z) - Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark.
On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z) - Simplified State Space Layers for Sequence Modeling [11.215817688691194]
Recently, models using structured state space sequence layers achieved state-of-the-art performance on many long-range tasks.
We revisit the idea that closely following the HiPPO framework is necessary for high performance.
We replace the bank of many independent single-input, single-output (SISO) SSMs the S4 layer uses with one multi-input, multi-output (MIMO) SSM.
S5 matches S4's performance on long-range tasks, including achieving an average of 82.46% on the suite of Long Range Arena benchmarks.
arXiv Detail & Related papers (2022-08-09T17:57:43Z) - On the Parameterization and Initialization of Diagonal State Space
Models [35.68370606343843]
We show how to parameterize and initialize diagonal state space models.
We show that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension.
arXiv Detail & Related papers (2022-06-23T17:58:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.