Related papers: A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures

A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures

URL: http://arxiv.org/abs/2510.06640v1
Date: Wed, 08 Oct 2025 04:46:11 GMT
Title: A Comparative Analysis of Contextual Representation Flow in State-Space and Transformer Architectures
Authors: Nhat M. Hoang, Do Xuan Long, Cong-Duy Nguyen, Min-Yen Kan, Luu Anh Tuan,
Abstract summary: State Space Models (SSMs) have emerged as efficient alternatives to Transformer-Based Models (TBMs) for long-sequence processing.<n>We present the first unified, token- and layer-level analysis of representation propagation in SSMs and TBMs.<n>We find a key divergence: TBMs rapidly homogenize token representations, with diversity reemerging only in later layers, while SSMs preserve token uniqueness early but converge to homogenization deeper.
Score: 27.45316137669387
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: State Space Models (SSMs) have recently emerged as efficient alternatives to Transformer-Based Models (TBMs) for long-sequence processing, offering linear scaling and lower memory use. Yet, how contextual information flows across layers and tokens in these architectures remains understudied. We present the first unified, token- and layer-level analysis of representation propagation in SSMs and TBMs. Using centered kernel alignment, stability metrics, and probing, we characterize how representations evolve within and across layers. We find a key divergence: TBMs rapidly homogenize token representations, with diversity reemerging only in later layers, while SSMs preserve token uniqueness early but converge to homogenization deeper. Theoretical analysis and parameter randomization further reveal that oversmoothing in TBMs stems from architectural design, whereas in SSMs it arises mainly from training dynamics. These insights clarify the inductive biases of both architectures and inform future model and training designs for long-context reasoning.

Related papers

Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models [77.98801218316505]
Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning.<n>We investigate the internal processing of LLMs during in-context concept inference.
arXiv Detail & Related papers (2026-02-08T03:14:39Z)
Towards Understanding What State Space Models Learn About Code [5.605881212882263]
State Space Models (SSMs) have emerged as an efficient alternative to the transformer architecture.<n>Recent studies show that SSMs can match or surpass Transformers on code understanding tasks, such as code retrieval, when trained under similar conditions.<n>We present the first systematic analysis of what SSM-based code models actually learn and perform the first comparative analysis of SSM and Transformer-based code models.
arXiv Detail & Related papers (2026-02-06T15:29:46Z)
On the Relation of State Space Models and Hidden Markov Models [0.07646713951724009]
State Space Models (SSMs) and Hidden Markov Models (HMMs) are foundational frameworks for modeling sequential data with latent variables.<n>Recent deterministic state space models have re-emerged in natural language processing through architectures such as S4 and Mamba.
arXiv Detail & Related papers (2026-01-19T19:51:05Z)
X-VMamba: Explainable Vision Mamba [0.0]
State Space Models (SSMs) have emerged as powerful alternatives to Transformers for sequence modeling.<n>We introduce a controllability-based interpretability framework that quantifies how different parts of the input sequence (tokens or patches) influence the internal state dynamics of SSMs.<n>Our framework establishes controllability analysis as a unified, foundational interpretability paradigm for SSMs across all domains.
arXiv Detail & Related papers (2025-11-16T17:18:12Z)
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities [76.46448367752944]
multimodal large language models (MLLMs) unify visual understanding and image generation within a single framework.<n>Most existing MLLMs rely on autore (AR) architectures, which impose inherent limitations on future development.<n>We introduce FUDOKI, a unified multimodal model purely based on discrete flow matching.
arXiv Detail & Related papers (2025-05-26T15:46:53Z)
Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling [19.10832920407789]
We introduce a new perspective by embedding the key principles of modern SSM directly into the Message-Passing Neural Network framework.<n>Our approach, MP-SSM, enables efficient, permutation-equivariant, and long-range information propagation while preserving the architectural simplicity of message passing.
arXiv Detail & Related papers (2025-05-24T14:53:07Z)
Will Pre-Training Ever End? A First Step Toward Next-Generation Foundation MLLMs via Self-Improving Systematic Cognition [89.50068130832635]
Self-Improving cognition (SIcog) is a self-learning framework for constructing next-generation foundation MLLMs by multimodal knowledge.<n>We propose Chain-of-Description for step-by-step visual understanding and integrate structured Chain-of-Thought (CoT) reasoning to support in-depth multimodal reasoning.<n>Experiments demonstrate SIcog's effectiveness in developing MLLMs with enhanced multimodal cognition.
arXiv Detail & Related papers (2025-03-16T00:25:13Z)
Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing [56.66469232740998]
We show that Structured State Space Models (SSMs) are inherently limited by strong recency bias.<n>This bias impairs the models' ability to recall distant information and introduces robustness issues.<n>We propose to polarize two channels of the state transition matrices in SSMs, setting them to zero and one, respectively, simultaneously addressing recency bias and over-smoothing.
arXiv Detail & Related papers (2024-12-31T22:06:39Z)
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks [50.29356570858905]
We introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation.<n>We provide principled comparisons between softmax attention and other model classes, discussing the theoretical conditions under which softmax attention can be approximated.<n>This shows the DSF's potential to guide the systematic development of future more efficient and scalable foundation models.
arXiv Detail & Related papers (2024-05-24T17:19:57Z)
Theoretical Foundations of Deep Selective State-Space Models [13.971499161967083]
Deep SSMs demonstrate outstanding performance across a diverse set of domains.<n>Recent developments show that if the linear recurrence powering SSMs allows for multiplicative interactions between inputs and hidden states.<n>We show that when random linear recurrences are equipped with simple input-controlled transitions, then the hidden state is provably a low-dimensional projection of a powerful mathematical object.
arXiv Detail & Related papers (2024-02-29T11:20:16Z)
Disentangling Structured Components: Towards Adaptive, Interpretable and Scalable Time Series Forecasting [52.47493322446537]
We develop a adaptive, interpretable and scalable forecasting framework, which seeks to individually model each component of the spatial-temporal patterns. SCNN works with a pre-defined generative process of MTS, which arithmetically characterizes the latent structure of the spatial-temporal patterns. Extensive experiments are conducted to demonstrate that SCNN can achieve superior performance over state-of-the-art models on three real-world datasets.
arXiv Detail & Related papers (2023-05-22T13:39:44Z)
PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning [109.84770951839289]
We present PredRNN, a new recurrent network for learning visual dynamics from historical context. We show that our approach obtains highly competitive results on three standard datasets.
arXiv Detail & Related papers (2021-03-17T08:28:30Z)

This list is automatically generated from the titles and abstracts of the papers in this site.