Theoretical Foundations of Deep Selective State-Space Models
- URL: http://arxiv.org/abs/2402.19047v4
- Date: Mon, 06 Jan 2025 14:43:25 GMT
- Title: Theoretical Foundations of Deep Selective State-Space Models
- Authors: Nicola Muca Cirone, Antonio Orvieto, Benjamin Walker, Cristopher Salvi, Terry Lyons,
- Abstract summary: Deep SSMs demonstrate outstanding performance across a diverse set of domains.
Recent developments show that if the linear recurrence powering SSMs allows for multiplicative interactions between inputs and hidden states.
We show that when random linear recurrences are equipped with simple input-controlled transitions, then the hidden state is provably a low-dimensional projection of a powerful mathematical object.
- Score: 13.971499161967083
- License:
- Abstract: Structured state-space models (SSMs) such as S4, stemming from the seminal work of Gu et al., are gaining popularity as effective approaches for modeling sequential data. Deep SSMs demonstrate outstanding performance across a diverse set of domains, at a reduced training and inference cost compared to attention-based transformers. Recent developments show that if the linear recurrence powering SSMs allows for multiplicative interactions between inputs and hidden states (e.g. GateLoop, Mamba, GLA), then the resulting architecture can surpass in both in accuracy and efficiency attention-powered foundation models trained on text, at scales of billion parameters. In this paper, we give theoretical grounding to this recent finding using tools from Rough Path Theory: we show that when random linear recurrences are equipped with simple input-controlled transitions (selectivity mechanism), then the hidden state is provably a low-dimensional projection of a powerful mathematical object called the signature of the input -- capturing non-linear interactions between tokens at distinct timescales. Our theory not only motivates the success of modern selective state-space models such as Mamba but also provides a solid framework to understand the expressive power of future SSM variants.
Related papers
- Scalable Language Models with Posterior Inference of Latent Thought Vectors [52.63299874322121]
Latent-Thought Language Models (LTMs) incorporate explicit latent thought vectors that follow an explicit prior model in latent space.
LTMs possess additional scaling dimensions beyond traditional LLMs, yielding a structured design space.
LTMs significantly outperform conventional autoregressive models and discrete diffusion models in validation perplexity and zero-shot language modeling.
arXiv Detail & Related papers (2025-02-03T17:50:34Z) - Recursive Learning of Asymptotic Variational Objectives [49.69399307452126]
General state-space models (SSMs) are widely used in statistical machine learning and are among the most classical generative models for sequential time-series data.
Online sequential IWAE (OSIWAE) allows for online learning of both model parameters and a Markovian recognition model for inferring latent states.
This approach is more theoretically well-founded than recently proposed online variational SMC methods.
arXiv Detail & Related papers (2024-11-04T16:12:37Z) - START: A Generalized State Space Model with Saliency-Driven Token-Aware Transformation [27.301312891532277]
Domain Generalization (DG) aims to enable models to generalize to unseen target domains by learning from multiple source domains.
We propose START, which achieves state-of-the-art (SOTA) performances and offers a competitive alternative to CNNs and ViTs.
Our START can selectively perturb and suppress domain-specific features in salient tokens within the input-dependent matrices of SSMs, thus effectively reducing the discrepancy between different domains.
arXiv Detail & Related papers (2024-10-21T13:50:32Z) - State-space models can learn in-context by gradient descent [1.3087858009942543]
We show that state-space models can perform gradient-based learning and use it for in-context learning in much the same way as transformers.
Specifically, we prove that a single structured state-space model layer, augmented with multiplicative input and output gating, can reproduce the outputs of an implicit linear model.
We also provide novel insights into the relationship between state-space models and linear self-attention, and their ability to learn in-context.
arXiv Detail & Related papers (2024-10-15T15:22:38Z) - Mathematical Formalism for Memory Compression in Selective State Space Models [0.0]
State space models (SSMs) have emerged as a powerful framework for modelling long-range dependencies in sequence data.
We develop a rigorous mathematical framework for understanding memory compression in selective state space models.
We show that selective SSMs offer significant improvements in memory efficiency and processing speed compared to traditional RNN-based models.
arXiv Detail & Related papers (2024-10-04T05:45:48Z) - Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL [57.202733701029594]
We propose Decision Mamba, a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy.
To address these challenges, we propose Decision Mamba, a novel multi-grained state space model (SSM) with a self-evolving policy learning strategy.
To mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization.
arXiv Detail & Related papers (2024-06-08T10:12:00Z) - The Expressive Capacity of State Space Models: A Formal Language Perspective [0.8948475969696075]
recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers.
We present a comprehensive theoretical study of the capacity of such SSMs as it compares to that of transformers and traditional RNNs.
arXiv Detail & Related papers (2024-05-27T17:46:57Z) - Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner [46.866240648471894]
Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system.
We present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation.
We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales.
arXiv Detail & Related papers (2024-05-06T06:23:06Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition [50.064502884594376]
We study the problem of human action recognition using motion capture (MoCap) sequences.
We propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences.
The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models.
arXiv Detail & Related papers (2023-03-31T16:19:27Z) - GEM: Group Enhanced Model for Learning Dynamical Control Systems [78.56159072162103]
We build effective dynamical models that are amenable to sample-based learning.
We show that learning the dynamics on a Lie algebra vector space is more effective than learning a direct state transition model.
This work sheds light on a connection between learning of dynamics and Lie group properties, which opens doors for new research directions.
arXiv Detail & Related papers (2021-04-07T01:08:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.