Theoretical Foundations of Deep Selective State-Space Models
- URL: http://arxiv.org/abs/2402.19047v2
- Date: Mon, 4 Mar 2024 11:09:49 GMT
- Title: Theoretical Foundations of Deep Selective State-Space Models
- Authors: Nicola Muca Cirone, Antonio Orvieto, Benjamin Walker, Cristopher Salvi
and Terry Lyons
- Abstract summary: Deep SSMs demonstrate outstanding performance across a diverse set of domains.
Recent developments show that if the linear recurrence powering SSMs allows for multiplicative interactions between inputs and hidden states.
We show that when random linear recurrences are equipped with simple input-controlled transitions, then the hidden state is provably a low-dimensional projection of a powerful mathematical object.
- Score: 14.989266348816749
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structured state-space models (SSMs) such as S4, stemming from the seminal
work of Gu et al., are gaining popularity as effective approaches for modeling
sequential data. Deep SSMs demonstrate outstanding performance across a diverse
set of domains, at a reduced training and inference cost compared to
attention-based transformers. Recent developments show that if the linear
recurrence powering SSMs allows for multiplicative interactions between inputs
and hidden states (e.g. GateLoop, Mamba, GLA), then the resulting architecture
can surpass in both in accuracy and efficiency attention-powered foundation
models trained on text, at scales of billion parameters. In this paper, we give
theoretical grounding to this recent finding using tools from Rough Path
Theory: we show that when random linear recurrences are equipped with simple
input-controlled transitions (selectivity mechanism), then the hidden state is
provably a low-dimensional projection of a powerful mathematical object called
the signature of the input -- capturing non-linear interactions between tokens
at distinct timescales. Our theory not only motivates the success of modern
selective state-space models such as Mamba but also provides a solid framework
to understand the expressive power of future SSM variants.
Related papers
- Longhorn: State Space Models are Amortized Online Learners [51.10124201221601]
We introduce a novel deep SSM architecture based on the implicit update for optimizing an online regression objective.
Our experimental results show that our models outperform state-of-the-art SSMs on standard sequence modeling benchmarks and language modeling tasks.
arXiv Detail & Related papers (2024-07-19T11:12:08Z) - Generalizable Implicit Neural Representation As a Universal Spatiotemporal Traffic Data Learner [46.866240648471894]
Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system.
We present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation.
We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales.
arXiv Detail & Related papers (2024-06-13T02:03:22Z) - Decision Mamba: A Multi-Grained State Space Model with Self-Evolution Regularization for Offline RL [57.202733701029594]
Decision Mamba is a novel multi-grained state space model with a self-evolving policy learning strategy.
To mitigate the overfitting issue on noisy trajectories, a self-evolving policy is proposed by using progressive regularization.
The policy evolves by using its own past knowledge to refine the suboptimal actions, thus enhancing its robustness on noisy demonstrations.
arXiv Detail & Related papers (2024-06-08T10:12:00Z) - The Expressive Capacity of State Space Models: A Formal Language Perspective [0.8948475969696075]
recurrent models based on linear state space models (SSMs) have shown promising performance in language modeling (LM), competititve with transformers.
We present a comprehensive theoretical study of the capacity of such SSMs as it compares to that of transformers and traditional RNNs.
arXiv Detail & Related papers (2024-05-27T17:46:57Z) - Spatiotemporal Implicit Neural Representation as a Generalized Traffic Data Learner [46.866240648471894]
Spatiotemporal Traffic Data (STTD) measures the complex dynamical behaviors of the multiscale transportation system.
We present a novel paradigm to address the STTD learning problem by parameterizing STTD as an implicit neural representation.
We validate its effectiveness through extensive experiments in real-world scenarios, showcasing applications from corridor to network scales.
arXiv Detail & Related papers (2024-05-06T06:23:06Z) - State Space Models as Foundation Models: A Control Theoretic Overview [3.3222241150972356]
In recent years, there has been a growing interest in integrating linear state-space models (SSM) in deep neural network architectures.
This paper is intended as a gentle introduction to SSM-based architectures for control theorists.
It provides a systematic review of the most successful SSM proposals and highlights their main features from a control theoretic perspective.
arXiv Detail & Related papers (2024-03-25T16:10:47Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - Sparse Modular Activation for Efficient Sequence Modeling [94.11125833685583]
Recent models combining Linear State Space Models with self-attention mechanisms have demonstrated impressive results across a range of sequence modeling tasks.
Current approaches apply attention modules statically and uniformly to all elements in the input sequences, leading to sub-optimal quality-efficiency trade-offs.
We introduce Sparse Modular Activation (SMA), a general mechanism enabling neural networks to sparsely activate sub-modules for sequence elements in a differentiable manner.
arXiv Detail & Related papers (2023-06-19T23:10:02Z) - STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action
Recognition [66.96931254510544]
We study the problem of human action recognition using motion capture (MoCap) sequences.
We propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences.
The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models.
arXiv Detail & Related papers (2023-03-31T16:19:27Z) - GEM: Group Enhanced Model for Learning Dynamical Control Systems [78.56159072162103]
We build effective dynamical models that are amenable to sample-based learning.
We show that learning the dynamics on a Lie algebra vector space is more effective than learning a direct state transition model.
This work sheds light on a connection between learning of dynamics and Lie group properties, which opens doors for new research directions.
arXiv Detail & Related papers (2021-04-07T01:08:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.