Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models
- URL: http://arxiv.org/abs/2509.22284v2
- Date: Tue, 07 Oct 2025 14:10:40 GMT
- Title: Structured Sparse Transition Matrices to Enable State Tracking in State-Space Models
- Authors: Aleksandar Terzić, Nicolas Menet, Michael Hersche, Thomas Hofmann, Abbas Rahimi,
- Abstract summary: We propose a structured sparse parametrization of transition matrices in state-space models (SSMs)<n>Our method, PD-SSM, parametrizes the transition matrix as the product of a column one-hot matrix ($P$) and a complex-valued diagonal matrix ($D$)<n>The model significantly outperforms a wide collection of modern SSM variants on various FSA state tracking tasks.
- Score: 68.31088463716269
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern state-space models (SSMs) often utilize transition matrices which enable efficient computation but pose restrictions on the model's expressivity, as measured in terms of the ability to emulate finite-state automata (FSA). While unstructured transition matrices are optimal in terms of expressivity, they come at a prohibitively high compute and memory cost even for moderate state sizes. We propose a structured sparse parametrization of transition matrices in SSMs that enables FSA state tracking with optimal state size and depth, while keeping the computational cost of the recurrence comparable to that of diagonal SSMs. Our method, PD-SSM, parametrizes the transition matrix as the product of a column one-hot matrix ($P$) and a complex-valued diagonal matrix ($D$). Consequently, the computational cost of parallel scans scales linearly with the state size. Theoretically, the model is BIBO-stable and can emulate any $N$-state FSA with one layer of dimension $N$ and a linear readout of size $N \times N$, significantly improving on all current structured SSM guarantees. Experimentally, the model significantly outperforms a wide collection of modern SSM variants on various FSA state tracking tasks. On multiclass time-series classification, the performance is comparable to that of neural controlled differential equations, a paradigm explicitly built for time-series analysis. Finally, we integrate PD-SSM into a hybrid Transformer-SSM architecture and demonstrate that the model can effectively track the states of a complex FSA in which transitions are encoded as a set of variable-length English sentences. The code is available at https://github.com/IBM/expressive-sparse-state-space-model
Related papers
- Scaling Up Liquid-Resistance Liquid-Capacitance Networks for Efficient Sequence Modeling [50.994194925685434]
LrcSSM is a $textitnon-linear$ recurrent model that processes long sequences as fast as today's linear state-space layers.<n>By forcing the Jacobian matrix to be diagonal, the full sequence can be solved in parallel.<n>LrcSSM offers a formal gradient-stability guarantee that other input-varying systems such as Liquid-S4 do not provide.
arXiv Detail & Related papers (2025-05-27T20:02:59Z) - Exemplar-Free Continual Learning for State Space Models [32.73275711666184]
State-Space Models (SSMs) excel at capturing long-range dependencies with structured recurrence.<n>Their evolving internal states pose challenges in adapting them under Continual Learning.<n>We propose Inf-SSM, a novel and simple geometry-aware regularization method.
arXiv Detail & Related papers (2025-05-24T08:59:13Z) - FFT-based Dynamic Subspace Selection for Low-Rank Adaptive Optimization of Large Language Models [49.397861654088636]
We propose a two-step procedure to approximate SVD/QR-based gradient projections into lower-dimensional spaces.<n>We show that our strategy achieves faster runtime and reduced memory usage by up to $25%$ across different model sizes.
arXiv Detail & Related papers (2025-05-23T14:37:00Z) - Structured Linear CDEs: Maximally Expressive and Parallel-in-Time Sequence Models [15.650005330621148]
This work introduces Structured Linear Controlled Differential Equations (SLiCEs)<n>It is a unifying framework for sequence models with structured, input-dependent state-transition matrices.<n>We prove that SLiCEs employ block-diagonal, sparse, or Walsh-Hadamard matrices.
arXiv Detail & Related papers (2025-05-23T11:34:21Z) - FedSVD: Adaptive Orthogonalization for Private Federated Learning with LoRA [61.79405341803085]
Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)<n>Low-Rank Adaptation (LoRA) is widely used for efficient fine-tuning of language models in federated learning (FL)
arXiv Detail & Related papers (2025-05-19T07:32:56Z) - DeltaProduct: Improving State-Tracking in Linear RNNs via Householder Products [60.72655477351486]
Linear Recurrent Neural Networks (linear RNNs) have emerged as competitive alternatives to Transformers for sequence modeling.<n>Existing architectures face a fundamental trade-off between expressivity and efficiency, dictated by the structure of their state-transition matrices.
arXiv Detail & Related papers (2025-02-14T16:59:05Z) - On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages [56.22289522687125]
Selective state-space models (SSMs) are an emerging alternative to the Transformer.<n>We analyze their expressiveness and length generalization performance on regular language tasks.<n>We introduce the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization.
arXiv Detail & Related papers (2024-12-26T20:53:04Z) - Distributed Representations Enable Robust Multi-Timescale Symbolic Computation in Neuromorphic Hardware [3.961418890143814]
We describe a single-shot weight learning scheme to embed robust multi-timescale dynamics into attractor-based RSNNs.<n>We embed finite state machines into the RSNN dynamics by superimposing a symmetric autoassociative weight matrix.<n>This work introduces a scalable approach to embed robust symbolic computation through recurrent dynamics into neuromorphic hardware.
arXiv Detail & Related papers (2024-05-02T14:11:50Z) - Efficiently Modeling Long Sequences with Structured State Spaces [15.456254157293836]
We propose a new sequence model based on a new parameterization for the fundamental state space model.
S4 achieves strong empirical results across a diverse range of established benchmarks, including (i) 91% accuracy on sequential CIFAR-10 with no data augmentation or auxiliary losses, on par with a larger 2-D ResNet.
arXiv Detail & Related papers (2021-10-31T03:32:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.