Related papers: Bridging Expressivity and Scalability with Adaptive Unitary SSMs

Bridging Expressivity and Scalability with Adaptive Unitary SSMs

URL: http://arxiv.org/abs/2507.05238v1
Date: Mon, 07 Jul 2025 17:47:16 GMT
Title: Bridging Expressivity and Scalability with Adaptive Unitary SSMs
Authors: Arjun Karuvally, Franz Nowak, Anderson T. Keller, Carmen Amo Alonso, Terrence J. Sejnowski, Hava T. Siegelmann,
Abstract summary: We introduce the Adaptive Unitary State Space Model (AUSSM)- a novel class of SSMs that skew-symmetric, input-dependent recurrence to achieve unitary evolution and high expressive power.<n>Our results demonstrate that adaptive unitary recurrence provides a powerful and efficient algorithmic bias for both symbolic and continuous sequence modeling.
Score: 4.61803711540329
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Recent work has revealed that state space models (SSMs), while efficient for long-sequence processing, are fundamentally limited in their ability to represent formal languages particularly due to time-invariant and real-valued recurrence structures. In this work, we draw inspiration from adaptive and structured dynamics observed in biological neural systems and introduce the Adaptive Unitary State Space Model (AUSSM)- a novel class of SSMs that leverages skew-symmetric, input-dependent recurrence to achieve unitary evolution and high expressive power. Using algebraic automata theory, we prove that AUSSM can perform modulo counting and simulate solvable group automata at finite precision, enabling SSMs to model a broad class of regular languages that are out of reach for other SSM architectures. To overcome the practical inefficiencies of adaptive recurrence, we develop a separable convolution formulation and a CUDA implementation that enables scalable parallel training. Empirically, we show that AUSSM when interleaved with Mamba outperform prior SSMs on formal algorithmic tasks such as parity and modular arithmetic, and achieve competent performance on real-world long time-series classification benchmarks. Our results demonstrate that adaptive unitary recurrence provides a powerful and efficient inductive bias for both symbolic and continuous sequence modeling.

Related papers

Sequential-Parallel Duality in Prefix Scannable Models [68.39855814099997]
Recent developments have given rise to various models, such as Gated Linear Attention (GLA) and Mamba.<n>This raises a natural question: can we characterize the full class of neural sequence models that support near-constant-time parallel evaluation and linear-time, constant-space sequential inference?
arXiv Detail & Related papers (2025-06-12T17:32:02Z)
Message-Passing State-Space Models: Improving Graph Learning with Modern Sequence Modeling [19.10832920407789]
We introduce a new perspective by embedding the key principles of modern SSM directly into the Message-Passing Neural Network framework.<n>Our approach, MP-SSM, enables efficient, permutation-equivariant, and long-range information propagation while preserving the architectural simplicity of message passing.
arXiv Detail & Related papers (2025-05-24T14:53:07Z)
Exemplar-Free Continual Learning for State Space Models [32.73275711666184]
State-Space Models (SSMs) excel at capturing long-range dependencies with structured recurrence.<n>Their evolving internal states pose challenges in adapting them under Continual Learning.<n>We propose Inf-SSM, a novel and simple geometry-aware regularization method.
arXiv Detail & Related papers (2025-05-24T08:59:13Z)
Efficient Transformed Gaussian Process State-Space Models for Non-Stationary High-Dimensional Dynamical Systems [49.819436680336786]
We propose an efficient transformed Gaussian process state-space model (ETGPSSM) for scalable and flexible modeling of high-dimensional, non-stationary dynamical systems.<n>Specifically, our ETGPSSM integrates a single shared GP with input-dependent normalizing flows, yielding an expressive implicit process prior that captures complex, non-stationary transition dynamics.<n>Our ETGPSSM outperforms existing GPSSMs and neural network-based SSMs in terms of computational efficiency and accuracy.
arXiv Detail & Related papers (2025-03-24T03:19:45Z)
From S4 to Mamba: A Comprehensive Survey on Structured State Space Models [0.0]
Recent advancements in sequence modeling have led to the emergence of Structured State Space Models (SSMs)<n>SSMs leverage structured recurrence and state-space representations to achieve superior long-sequence processing with linear or near-linear complexity.<n>This survey serves as a structured guide for researchers and practitioners, detailing the advancements, trade-offs, and future directions of SSM-based architectures in AI and deep learning.
arXiv Detail & Related papers (2025-03-22T01:55:32Z)
DSMoE: Matrix-Partitioned Experts with Dynamic Routing for Computation-Efficient Dense LLMs [70.91804882618243]
This paper proposes DSMoE, a novel approach that achieves sparsification by partitioning pre-trained FFN layers into computational blocks.<n>We implement adaptive expert routing using sigmoid activation and straight-through estimators, enabling tokens to flexibly access different aspects of model knowledge.<n>Experiments on LLaMA models demonstrate that under equivalent computational constraints, DSMoE achieves superior performance compared to existing pruning and MoE approaches.
arXiv Detail & Related papers (2025-02-18T02:37:26Z)
On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages [56.22289522687125]
Selective state-space models (SSMs) are an emerging alternative to the Transformer.<n>We analyze their expressiveness and length generalization performance on regular language tasks.<n>We introduce the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization.
arXiv Detail & Related papers (2024-12-26T20:53:04Z)
Longhorn: State Space Models are Amortized Online Learners [51.10124201221601]
State-space models (SSMs) offer linear decoding efficiency while maintaining parallelism during training. In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems. We introduce a novel deep SSM architecture, Longhorn, whose update resembles the closed-form solution for solving the online associative recall problem.
arXiv Detail & Related papers (2024-07-19T11:12:08Z)
Theoretical Foundations of Deep Selective State-Space Models [13.971499161967083]
Deep SSMs demonstrate outstanding performance across a diverse set of domains.<n>Recent developments show that if the linear recurrence powering SSMs allows for multiplicative interactions between inputs and hidden states.<n>We show that when random linear recurrences are equipped with simple input-controlled transitions, then the hidden state is provably a low-dimensional projection of a powerful mathematical object.
arXiv Detail & Related papers (2024-02-29T11:20:16Z)
Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference. Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z)
Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks. Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs. We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.