Analysis of Long Range Dependency Understanding in State Space Models
- URL: http://arxiv.org/abs/2601.13048v1
- Date: Mon, 19 Jan 2026 13:39:42 GMT
- Title: Analysis of Long Range Dependency Understanding in State Space Models
- Authors: Srividya Ravikumar, Abhinav Anand, Shweta Verma, Mira Mezini,
- Abstract summary: We present the first systematic kernel interpretability study of the diagonalized state-space model (S4D) trained on a real-world task.<n>We show that the long-range modeling capability of S4D varies significantly under different model architectures, affecting model performance.
- Score: 5.1981024469718315
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Although state-space models (SSMs) have demonstrated strong performance on long-sequence benchmarks, most research has emphasized predictive accuracy rather than interpretability. In this work, we present the first systematic kernel interpretability study of the diagonalized state-space model (S4D) trained on a real-world task (vulnerability detection in source code). Through time and frequency domain analysis of the S4D kernel, we show that the long-range modeling capability of S4D varies significantly under different model architectures, affecting model performance. For instance, we show that the depending on the architecture, S4D kernel can behave as low-pass, band-pass or high-pass filter. The insights from our analysis can guide future work in designing better S4D-based models.
Related papers
- MLLM-4D: Towards Visual-based Spatial-Temporal Intelligence [50.11889361459544]
Humans are born with vision-based 4D spatial-temporal intelligence.<n>Despite its importance, this capability remains a significant bottleneck for current large language models (MLLMs)
arXiv Detail & Related papers (2026-02-28T07:23:36Z) - Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis [36.228108480107906]
We propose a framework for the statistical analysis of genus-zeroD surfaces that deform and evolve over time.<n>We introduce Dynamic Spherical Neural Surfaces (D-SNS), an efficient continuoustemporal representation for genus-0 4D surfaces.<n>We demonstrate the efficiency of the framework on 4D human and face datasets.
arXiv Detail & Related papers (2025-03-05T03:02:59Z) - Model Compression Method for S4 with Diagonal State Space Layers using Balanced Truncation [0.0]
We propose to use the balanced truncation, a prevalent model reduction technique in control theory, applied specifically to DSS layers in pre-trained S4 model as a novel model compression method.
Numerical experiments demonstrate that our trained models combined with the balanced truncation surpass conventionally trained models with Skew-HiPPO.
arXiv Detail & Related papers (2024-02-25T05:22:45Z) - Never Train from Scratch: Fair Comparison of Long-Sequence Models Requires Data-Driven Priors [44.5740422079]
We show that pretraining with standard denoising objectives leads to dramatic gains across multiple architectures.
In stark contrast to prior works, we find vanilla Transformers to match the performance of S4 on Long Range Arena when properly pretrained.
arXiv Detail & Related papers (2023-10-04T17:17:06Z) - Robustifying State-space Models for Long Sequences via Approximate
Diagonalization [47.321212977509454]
State-space models (SSMs) have emerged as a framework for learning long-range sequence tasks.
diagonalizing the HiPPO framework is itself an ill-posed problem.
We introduce a generic, backward-stable "perturb-then-diagonalize" (PTD) methodology.
arXiv Detail & Related papers (2023-10-02T23:36:13Z) - Structured State Space Models for In-Context Reinforcement Learning [30.189834820419446]
Structured state space sequence (S4) models have recently achieved state-of-the-art performance on long-range sequence modeling tasks.
We propose a modification to a variant of S4 that enables us to initialise and reset the hidden state in parallel.
We show that our modified architecture runs faster than Transformers in sequence length and performs better than RNN's on a simple memory-based task.
arXiv Detail & Related papers (2023-03-07T15:32:18Z) - Deep Latent State Space Models for Time-Series Generation [68.45746489575032]
We propose LS4, a generative model for sequences with latent variables evolving according to a state space ODE.
Inspired by recent deep state space models (S4), we achieve speedups by leveraging a convolutional representation of LS4.
We show that LS4 significantly outperforms previous continuous-time generative models in terms of marginal distribution, classification, and prediction scores on real-world datasets.
arXiv Detail & Related papers (2022-12-24T15:17:42Z) - Liquid Structural State-Space Models [106.74783377913433]
Liquid-S4 achieves an average performance of 87.32% on the Long-Range Arena benchmark.
On the full raw Speech Command recognition, dataset Liquid-S4 achieves 96.78% accuracy with a 30% reduction in parameter counts compared to S4.
arXiv Detail & Related papers (2022-09-26T18:37:13Z) - Long Range Language Modeling via Gated State Spaces [67.64091993846269]
We focus on autoregressive sequence modeling over English books, Github source code and ArXiv mathematics articles.
We propose a new layer named Gated State Space (GSS) and show that it trains significantly faster than the diagonal version of S4.
arXiv Detail & Related papers (2022-06-27T01:50:18Z) - On the Parameterization and Initialization of Diagonal State Space
Models [35.68370606343843]
We show how to parameterize and initialize diagonal state space models.
We show that the diagonal restriction of S4's matrix surprisingly recovers the same kernel in the limit of infinite state dimension.
arXiv Detail & Related papers (2022-06-23T17:58:39Z) - Diagonal State Spaces are as Effective as Structured State Spaces [3.8276199743296906]
We show that our $textitDiagonal State Space$ (DSS) model matches the performance of S4 on Long Range Arena tasks, speech classification on Speech Commands dataset, while being conceptually simpler and straightforward to implement.
In this work, we show that one can match the performance of S4 even without the low rank correction and thus assuming the state matrices to be diagonal.
arXiv Detail & Related papers (2022-03-27T16:30:33Z) - A Large-Scale Study on Unsupervised Spatiotemporal Representation
Learning [60.720251418816815]
We present a large-scale study on unsupervised representation learning from videos.
Our objective encourages temporally-persistent features in the same video.
We find that encouraging long-spanned persistency can be effective even if the timespan is 60 seconds.
arXiv Detail & Related papers (2021-04-29T17:59:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.