Related papers: Design Principles for Sequence Models via Coefficient Dynamics

Design Principles for Sequence Models via Coefficient Dynamics

URL: http://arxiv.org/abs/2510.09389v1
Date: Fri, 10 Oct 2025 13:42:31 GMT
Title: Design Principles for Sequence Models via Coefficient Dynamics
Authors: Jerome Sieber, Antonio Orvieto, Melanie N. Zeilinger, Carmen Amo Alonso,
Abstract summary: We develop a unified framework that makes this output operation explicit, by casting the linear combination coefficients as the outputs of autonomous linear dynamical systems driven by impulse inputs.<n>This viewpoint, in spirit substantially different from approaches focusing on connecting linear RNNs with linear attention, reveals a common mathematical theme across diverse architectures.<n>Thereby identifying tradeoffs between expressivity and efficient implementation, geometric constraints on input selectivity, and stability conditions for numerically stable training and information retention.
Score: 20.14360019974826
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep sequence models, ranging from Transformers and State Space Models (SSMs) to more recent approaches such as gated linear RNNs, fundamentally compute outputs as linear combinations of past value vectors. To draw insights and systematically compare such architectures, we develop a unified framework that makes this output operation explicit, by casting the linear combination coefficients as the outputs of autonomous linear dynamical systems driven by impulse inputs. This viewpoint, in spirit substantially different from approaches focusing on connecting linear RNNs with linear attention, reveals a common mathematical theme across diverse architectures and crucially captures softmax attention, on top of RNNs, SSMs, and related models. In contrast to new model proposals that are commonly evaluated on benchmarks, we derive design principles linking architectural choices to model properties. Thereby identifying tradeoffs between expressivity and efficient implementation, geometric constraints on input selectivity, and stability conditions for numerically stable training and information retention. By connecting several insights and observations from recent literature, the framework both explains empirical successes of recent designs and provides guiding principles for systematically designing new sequence model architectures.

Related papers

PRISM: Parallel Residual Iterative Sequence Model [52.26239951489612]
We propose PRISM (Parallel Residual Iterative Sequence Model) to resolve this tension.<n>PRISM introduces a solver-inspired inductive bias that captures key structural properties of multi-step refinement in a parallelizable form.<n>We prove that this formulation achieves Rank-$L$ accumulation, structurally expanding the update manifold beyond the single-step Rank-$1$ bottleneck.
arXiv Detail & Related papers (2026-02-11T12:39:41Z)
Lag Operator SSMs: A Geometric Framework for Structured State Space Modeling [3.3864018929063477]
We introduce a framework for constructing discrete-time Structured State Space Models (SSMs) that is both flexible and modular.<n>Our approach is based on a novel lag operator, which geometrically derives the discrete-time recurrence by measuring how the system's basis functions "slide"
arXiv Detail & Related papers (2025-12-22T02:25:26Z)
Task-Level Insights from Eigenvalues across Sequence Models [41.79939327722031]
We show that eigenvalues influence essential aspects of memory and long-range dependency modeling.<n>We then investigate how architectural modifications in sequence models impact both eigenvalue spectra and task performance.<n>This correspondence further strengthens the position of eigenvalue analysis as a principled metric for interpreting, understanding, and ultimately improving the capabilities of sequence models.
arXiv Detail & Related papers (2025-10-10T13:35:21Z)
Random Matrix Theory for Deep Learning: Beyond Eigenvalues of Linear Models [51.85815025140659]
Modern Machine Learning (ML) and Deep Neural Networks (DNNs) often operate on high-dimensional data.<n>In particular, the proportional regime where the data dimension, sample size, and number of model parameters are all large gives rise to novel and sometimes counterintuitive behaviors.<n>This paper extends traditional Random Matrix Theory (RMT) beyond eigenvalue-based analysis of linear models to address the challenges posed by nonlinear ML models.
arXiv Detail & Related papers (2025-06-16T06:54:08Z)
Sequential-Parallel Duality in Prefix Scannable Models [68.39855814099997]
Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba.<n>This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference?
arXiv Detail & Related papers (2025-06-12T17:32:02Z)
Efficient identification of linear, parameter-varying, and nonlinear systems with noise models [1.6385815610837167]
We present a general system identification procedure capable of estimating a broad spectrum of state-space dynamical models.<n>We show that for this general class of model structures, the model dynamics can be separated into a deterministic process and a noise part.<n>We parameterize the involved nonlinear functional relations by means of artificial neural-networks (ANNs)
arXiv Detail & Related papers (2025-04-16T11:23:30Z)
Generalized Factor Neural Network Model for High-dimensional Regression [50.554377879576066]
We tackle the challenges of modeling high-dimensional data sets with latent low-dimensional structures hidden within complex, non-linear, and noisy relationships.<n>Our approach enables a seamless integration of concepts from non-parametric regression, factor models, and neural networks for high-dimensional regression.
arXiv Detail & Related papers (2025-02-16T23:13:55Z)
Neural Port-Hamiltonian Differential Algebraic Equations for Compositional Learning of Electrical Networks [21.117540483724603]
We develop compositional learning algorithms for coupled dynamical systems, with a particular focus on electrical networks.<n>We introduce neural port-Hamiltonian differential algebraic equations (N-PHDAEs), which use neural networks to parameterize unknown terms in both the differential and algebraic components of a port-Hamiltonian DAE.<n>We show that the proposed N-PHDAE model achieves an order of magnitude improvement in prediction accuracy and constraint satisfaction when compared to a baseline N-ODE over long prediction time horizons.
arXiv Detail & Related papers (2024-12-15T15:13:11Z)
Towards Efficient Modelling of String Dynamics: A Comparison of State Space and Koopman based Deep Learning Methods [8.654571696634825]
State Space Models (SSM) and Koopman-based deep learning methods for modelling the dynamics of both linear and non-linear stiff strings. Our findings indicate that our proposed Koopman-based model performs as well as or better than other existing approaches in non-linear cases for long-sequence modelling. This research contributes insights into the physical modelling of dynamical systems by offering a comparative overview of these and previous methods and introducing innovative strategies for model improvement.
arXiv Detail & Related papers (2024-08-29T15:55:27Z)
Learnable & Interpretable Model Combination in Dynamical Systems Modeling [0.0]
This work briefly discusses which types of model are usually combined in dynamical systems modeling.<n>We propose a class of models that is capable of expressing mixed algebraic, discrete, and differential equation-based models.<n>Finally, we propose a new wildcard architecture that is capable of describing arbitrary combinations of models in an easy-to-interpret fashion.
arXiv Detail & Related papers (2024-06-12T11:17:11Z)
Understanding the differences in Foundation Models: Attention, State Space Models, and Recurrent Neural Networks [50.29356570858905]
We introduce the Dynamical Systems Framework (DSF), which allows a principled investigation of all these architectures in a common representation.<n>We provide principled comparisons between softmax attention and other model classes, discussing the theoretical conditions under which softmax attention can be approximated.<n>This shows the DSF's potential to guide the systematic development of future more efficient and scalable foundation models.
arXiv Detail & Related papers (2024-05-24T17:19:57Z)
Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization [60.73540999409032]
We show that expressive autoregressive dynamics models generate different dimensions of the next state and reward sequentially conditioned on previous dimensions. We also show that autoregressive dynamics models are useful for offline policy optimization by serving as a way to enrich the replay buffer.
arXiv Detail & Related papers (2021-04-28T16:48:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.