Related papers: Task-Level Insights from Eigenvalues across Sequence Models

Task-Level Insights from Eigenvalues across Sequence Models

URL: http://arxiv.org/abs/2510.09379v1
Date: Fri, 10 Oct 2025 13:35:21 GMT
Title: Task-Level Insights from Eigenvalues across Sequence Models
Authors: Rahel Rickenbach, Jelena Trisovic, Alexandre Didier, Jerome Sieber, Melanie N. Zeilinger,
Abstract summary: We show that eigenvalues influence essential aspects of memory and long-range dependency modeling.<n>We then investigate how architectural modifications in sequence models impact both eigenvalue spectra and task performance.<n>This correspondence further strengthens the position of eigenvalue analysis as a principled metric for interpreting, understanding, and ultimately improving the capabilities of sequence models.
Score: 41.79939327722031
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Although softmax attention drives state-of-the-art performance for sequence models, its quadratic complexity limits scalability, motivating linear alternatives such as state space models (SSMs). While these alternatives improve efficiency, their fundamental differences in information processing remain poorly understood. In this work, we leverage the recently proposed dynamical systems framework to represent softmax, norm and linear attention as dynamical systems, enabling a structured comparison with SSMs by analyzing their respective eigenvalue spectra. Since eigenvalues capture essential aspects of dynamical system behavior, we conduct an extensive empirical analysis across diverse sequence models and benchmarks. We first show that eigenvalues influence essential aspects of memory and long-range dependency modeling, revealing spectral signatures that align with task requirements. Building on these insights, we then investigate how architectural modifications in sequence models impact both eigenvalue spectra and task performance. This correspondence further strengthens the position of eigenvalue analysis as a principled metric for interpreting, understanding, and ultimately improving the capabilities of sequence models.

Related papers

Latent Matters: Learning Deep State-Space Models [6.489119428188]
Deep state-space models (DSSMs) enable temporal predictions by learning the underlying dynamics of observed sequence data.<n>We propose a constrained optimisation framework as a general approach for training DSSMs.<n>We introduce the extended Kalman VAE (EKVAE), which combines amortised variational inference with classic Bayesian filtering/smoothing to model dynamics more accurately than RNN-based DSSMs.
arXiv Detail & Related papers (2026-02-26T14:35:45Z)
A Mechanistic Analysis of Transformers for Dynamical Systems [4.590170084532207]
We study the representational capabilities and limitations of single-layer Transformers when applied to dynamical data.<n>For linear systems, we show that the convexity constraint imposed by softmax attention fundamentally restricts the class of dynamics that can be represented.<n>For nonlinear systems under partial observability, attention instead acts as an adaptive delay-embedding mechanism.
arXiv Detail & Related papers (2025-12-24T11:21:07Z)
An Integrated Fusion Framework for Ensemble Learning Leveraging Gradient Boosting and Fuzzy Rule-Based Models [59.13182819190547]
Fuzzy rule-based models excel in interpretability and have seen widespread application across diverse fields.<n>They face challenges such as complex design specifications and scalability issues with large datasets.<n>This paper proposes an Integrated Fusion Framework that merges the strengths of both paradigms to enhance model performance and interpretability.
arXiv Detail & Related papers (2025-11-11T10:28:23Z)
Design Principles for Sequence Models via Coefficient Dynamics [20.14360019974826]
We develop a unified framework that makes this output operation explicit, by casting the linear combination coefficients as the outputs of autonomous linear dynamical systems driven by impulse inputs.<n>This viewpoint, in spirit substantially different from approaches focusing on connecting linear RNNs with linear attention, reveals a common mathematical theme across diverse architectures.<n>Thereby identifying tradeoffs between expressivity and efficient implementation, geometric constraints on input selectivity, and stability conditions for numerically stable training and information retention.
arXiv Detail & Related papers (2025-10-10T13:42:31Z)
The Curious Case of In-Training Compression of State Space Models [49.819321766705514]
State Space Models (SSMs) tackle long sequence modeling tasks efficiently, offer both parallelizable training and fast inference.<n>Key design challenge is striking the right balance between maximizing expressivity and limiting this computational burden.<n>Our approach, textscCompreSSM, applies to Linear Time-Invariant SSMs such as Linear Recurrent Units, but is also extendable to selective models.
arXiv Detail & Related papers (2025-10-03T09:02:33Z)
Numerical Investigation of Sequence Modeling Theory using Controllable Memory Functions [14.79659491236138]
We propose a synthetic benchmarking framework to evaluate how effectively different sequence models capture distinct temporal structures.<n>The core of this approach is to generate synthetic targets, each characterized by a memory function and a parameter that determines the strength of temporal dependence.<n>Experiments on several sequence modeling architectures confirm existing theoretical insights and reveal new findings.
arXiv Detail & Related papers (2025-06-06T02:02:59Z)
Model Hemorrhage and the Robustness Limits of Large Language Models [119.46442117681147]
Large language models (LLMs) demonstrate strong performance across natural language processing tasks, yet undergo significant performance degradation when modified for deployment.<n>We define this phenomenon as model hemorrhage - performance decline caused by parameter alterations and architectural changes.
arXiv Detail & Related papers (2025-03-31T10:16:03Z)
Latent Space Energy-based Neural ODEs [73.01344439786524]
This paper introduces novel deep dynamical models designed to represent continuous-time sequences.<n>We train the model using maximum likelihood estimation with Markov chain Monte Carlo.<n> Experimental results on oscillating systems, videos and real-world state sequences (MuJoCo) demonstrate that our model with the learnable energy-based prior outperforms existing counterparts.
arXiv Detail & Related papers (2024-09-05T18:14:22Z)
Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods [8.654571696634825]
State Space Models (SSM) and Koopman-based deep learning methods for modelling the dynamics of both linear and non-linear stiff strings. Our findings indicate that our proposed Koopman-based model performs as well as or better than other existing approaches in non-linear cases for long-sequence modelling. This research contributes insights into the physical modelling of dynamical systems by offering a comparative overview of these and previous methods and introducing innovative strategies for model improvement.
arXiv Detail & Related papers (2024-08-29T15:55:27Z)
Intrinsic Dynamics-Driven Generalizable Scene Representations for Vision-Oriented Decision-Making Applications [0.21051221444478305]
How to improve the ability of scene representation is a key issue in vision-oriented decision-making applications. We propose an intrinsic dynamics-driven representation learning method with sequence models in visual reinforcement learning.
arXiv Detail & Related papers (2024-05-30T06:31:03Z)
Understanding Self-attention Mechanism via Dynamical System Perspective [58.024376086269015]
Self-attention mechanism (SAM) is widely used in various fields of artificial intelligence. We show that intrinsic stiffness phenomenon (SP) in the high-precision solution of ordinary differential equations (ODEs) also widely exists in high-performance neural networks (NN) We show that the SAM is also a stiffness-aware step size adaptor that can enhance the model's representational ability to measure intrinsic SP.
arXiv Detail & Related papers (2023-08-19T08:17:41Z)
Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions. We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.