Related papers: W4S4: WaLRUS Meets S4 for Long-Range Sequence Modeling

W4S4: WaLRUS Meets S4 for Long-Range Sequence Modeling

URL: http://arxiv.org/abs/2506.07920v1
Date: Mon, 09 Jun 2025 16:33:29 GMT
Title: W4S4: WaLRUS Meets S4 for Long-Range Sequence Modeling
Authors: Hossein Babaei, Mel White, Richard G. Baraniuk,
Abstract summary: State Space Models (SSMs) have emerged as powerful components for sequence modeling.<n>We introduce a new variant, W4S4 (WaLRUS for S4), a new class of SSMs constructed from redundant wavelet frames.<n>We show that WaLRUS retains information over long horizons significantly better than HiPPO-based SSMs.
Score: 23.453158933852357
License: http://creativecommons.org/licenses/by/4.0/
Abstract: State Space Models (SSMs) have emerged as powerful components for sequence modeling, enabling efficient handling of long-range dependencies via linear recurrence and convolutional computation. However, their effectiveness depends heavily on the choice and initialization of the state matrix. In this work, we build on the SaFARi framework and existing WaLRUS SSMs to introduce a new variant, W4S4 (WaLRUS for S4), a new class of SSMs constructed from redundant wavelet frames. WaLRUS admits a stable diagonalization and supports fast kernel computation without requiring low-rank approximations, making it both theoretically grounded and computationally efficient. We show that WaLRUS retains information over long horizons significantly better than HiPPO-based SSMs, both in isolation and when integrated into deep architectures such as S4. Our experiments demonstrate consistent improvements across delay reconstruction tasks, classification benchmarks, and long-range sequence modeling, confirming that high-quality, structured initialization enabled by wavelet-based state dynamic offers substantial advantages over existing alternatives. WaLRUS provides a scalable and versatile foundation for the next generation of deep SSM-based models.

Related papers

WaLRUS: Wavelets for Long-range Representation Using SSMs [22.697360024988484]
State-Space Models (SSMs) have proven to be powerful tools for modeling long-range dependencies in sequential data.<n>We introduce WaLRUS, a new implementation of SaFARi built from Daubechies wavelets.
arXiv Detail & Related papers (2025-05-17T22:41:24Z)
Provable Benefits of Complex Parameterizations for Structured State Space Models [51.90574950170374]
Structured state space models (SSMs) are linear dynamical systems adhering to a specified structure. In contrast to typical neural network modules, whose parameterizations are real, SSMs often use complex parameterizations. This paper takes a step towards explaining the benefits of complex parameterizations for SSMs by establishing formal gaps between real and complex diagonal SSMs.
arXiv Detail & Related papers (2024-10-17T22:35:50Z)
HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences. We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators. Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z)
Convolutional State Space Models for Long-Range Spatiotemporal Modeling [65.0993000439043]
ConvS5 is an efficient variant for long-rangetemporal modeling. It significantly outperforms Transformers and ConvNISTTM on a long horizon Moving-Lab experiment while training 3X faster than ConvLSTM and generating samples 400X faster than Transformers.
arXiv Detail & Related papers (2023-10-30T16:11:06Z)
Robustifying State-space Models for Long Sequences via Approximate Diagonalization [47.321212977509454]
State-space models (SSMs) have emerged as a framework for learning long-range sequence tasks. diagonalizing the HiPPO framework is itself an ill-posed problem. We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology.
arXiv Detail & Related papers (2023-10-02T23:36:13Z)
Structured State Space Models for In-Context Reinforcement Learning [30.189834820419446]
Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks. We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel. We show that our modified architecture runs faster than Transformers in sequence length and performs better than RNN's on a simple memory-based task.
arXiv Detail & Related papers (2023-03-07T15:32:18Z)
Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE. Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4. We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z)
Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark. On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z)
Simplified State Space Layers for Sequence Modeling [11.215817688691194]
Recently, models using structured state space sequence layers achieved state-of-the-art performance on many long-range tasks. We revisit the idea that closely following the HiPPO framework is necessary for high performance. We replace the bank of many independent single-input, single-output (SISO) SSMs the S4 layer uses with one multi-input, multi-output (MIMO) SSM. S5 matches S4's performance on long-range tasks, including achieving an average of 82.46% on the suite of Long Range Arena benchmarks.
arXiv Detail & Related papers (2022-08-09T17:57:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.