Towards Understanding What State Space Models Learn About Code
- URL: http://arxiv.org/abs/2602.06774v1
- Date: Fri, 06 Feb 2026 15:29:46 GMT
- Title: Towards Understanding What State Space Models Learn About Code
- Authors: Jiali Wu, Abhinav Anand, Shweta Verma, Mira Mezini,
- Abstract summary: State Space Models (SSMs) have emerged as an efficient alternative to the transformer architecture.<n>Recent studies show that SSMs can match or surpass Transformers on code understanding tasks, such as code retrieval, when trained under similar conditions.<n>We present the first systematic analysis of what SSM-based code models actually learn and perform the first comparative analysis of SSM and Transformer-based code models.
- Score: 5.605881212882263
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: State Space Models (SSMs) have emerged as an efficient alternative to the transformer architecture. Recent studies show that SSMs can match or surpass Transformers on code understanding tasks, such as code retrieval, when trained under similar conditions. However, their internal mechanisms remain a black box. We present the first systematic analysis of what SSM-based code models actually learn and perform the first comparative analysis of SSM and Transformer-based code models. Our analysis reveals that SSMs outperform Transformers at capturing code syntax and semantics in pretraining but forgets certain syntactic and semantic relations during fine-tuning on task, especially when the task emphasizes short-range dependencies. To diagnose this, we introduce SSM-Interpret, a frequency-domain framework that exposes a spectral shift toward short-range dependencies during fine-tuning. Guided by these findings, we propose architectural modifications that significantly improve the performance of SSM-based code model, validating that our analysis directly enables better models.
Related papers
- X-VMamba: Explainable Vision Mamba [0.0]
State Space Models (SSMs) have emerged as powerful alternatives to Transformers for sequence modeling.<n>We introduce a controllability-based interpretability framework that quantifies how different parts of the input sequence (tokens or patches) influence the internal state dynamics of SSMs.<n>Our framework establishes controllability analysis as a unified, foundational interpretability paradigm for SSMs across all domains.
arXiv Detail & Related papers (2025-11-16T17:18:12Z) - A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures [27.45316137669387]
State Space Models (SSMs) have emerged as efficient alternatives to Transformer-Based Models (TBMs) for long-sequence processing.<n>We present the first unified, token- and layer-level analysis of representation propagation in SSMs and TBMs.<n>We find a key divergence: TBMs rapidly homogenize token representations, with diversity reemerging only in later layers, while SSMs preserve token uniqueness early but converge to homogenization deeper.
arXiv Detail & Related papers (2025-10-08T04:46:11Z) - Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling [19.10832920407789]
We introduce a new perspective by embedding the key principles of modern SSM directly into the Message-Passing Neural Network framework.<n>Our approach, MP-SSM, enables efficient, permutation-equivariant, and long-range information propagation while preserving the architectural simplicity of message passing.
arXiv Detail & Related papers (2025-05-24T14:53:07Z) - On the locality bias and results in the Long Range Arena [49.15148871877941]
The Long Range Arena benchmark was designed to evaluate the performance of Transformer improvements.<n>A new series of architectures such as State Space Models (SSMs) gained some traction, greatly outperforming Transformers in the LRA.<n>We show that while the LRA is a benchmark for long-range dependency modeling, in reality most of the performance comes from short-range dependencies.
arXiv Detail & Related papers (2025-01-24T15:34:50Z) - On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages [56.22289522687125]
Selective state-space models (SSMs) are an emerging alternative to the Transformer.<n>We analyze their expressiveness and length generalization performance on regular language tasks.<n>We introduce the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization.
arXiv Detail & Related papers (2024-12-26T20:53:04Z) - Deep Learning-based Approaches for State Space Models: A Selective Review [15.295157876811066]
State-space models (SSMs) offer a powerful framework for dynamical system analysis.<n>This paper provides a selective review of recent advancements in deep neural network-based approaches for SSMs.
arXiv Detail & Related papers (2024-12-15T15:04:35Z) - Provable Benefits of Complex Parameterizations for Structured State Space Models [51.90574950170374]
Structured state space models (SSMs) are linear dynamical systems adhering to a specified structure.
In contrast to typical neural network modules, whose parameterizations are real, SSMs often use complex parameterizations.
This paper takes a step towards explaining the benefits of complex parameterizations for SSMs by establishing formal gaps between real and complex diagonal SSMs.
arXiv Detail & Related papers (2024-10-17T22:35:50Z) - Longhorn: State Space Models are Amortized Online Learners [51.10124201221601]
State-space models (SSMs) offer linear decoding efficiency while maintaining parallelism during training.
In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems.
We introduce a novel deep SSM architecture, Longhorn, whose update resembles the closed-form solution for solving the online associative recall problem.
arXiv Detail & Related papers (2024-07-19T11:12:08Z) - The Expressive Capacity of State Space Models: A Formal Language Perspective [0.8948475969696075]
recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers.
We present a comprehensive theoretical study of the capacity of such SSMs as it compares to that of transformers and traditional RNNs.
arXiv Detail & Related papers (2024-05-27T17:46:57Z) - HOPE for a Robust Parameterization of Long-memory State Space Models [51.66430224089725]
State-space models (SSMs) that utilize linear, time-invariant (LTI) systems are known for their effectiveness in learning long sequences.
We develop a new parameterization scheme, called HOPE, for LTI systems that utilize Markov parameters within Hankel operators.
Our new parameterization endows the SSM with non-decaying memory within a fixed time window, which is empirically corroborated by a sequential CIFAR-10 task with padded noise.
arXiv Detail & Related papers (2024-05-22T20:20:14Z) - State Space Model for New-Generation Network Alternative to Transformers: A Survey [52.812260379420394]
In the post-deep learning era, the Transformer architecture has demonstrated its powerful performance across pre-trained big models and various downstream tasks.
To further reduce the complexity of attention models, numerous efforts have been made to design more efficient methods.
Among them, the State Space Model (SSM), as a possible replacement for the self-attention based Transformer model, has drawn more and more attention in recent years.
arXiv Detail & Related papers (2024-04-15T07:24:45Z) - Guiding the PLMs with Semantic Anchors as Intermediate Supervision:
Towards Interpretable Semantic Parsing [57.11806632758607]
We propose to incorporate the current pretrained language models with a hierarchical decoder network.
By taking the first-principle structures as the semantic anchors, we propose two novel intermediate supervision tasks.
We conduct intensive experiments on several semantic parsing benchmarks and demonstrate that our approach can consistently outperform the baselines.
arXiv Detail & Related papers (2022-10-04T07:27:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.