SeRpEnt: Selective Resampling for Expressive State Space Models
- URL: http://arxiv.org/abs/2501.11729v1
- Date: Mon, 20 Jan 2025 20:27:50 GMT
- Title: SeRpEnt: Selective Resampling for Expressive State Space Models
- Authors: Stefano Rando, Luca Romani, Matteo Migliarini, Luca Franco, Denis Gudovskiy, Fabio Galasso,
- Abstract summary: State Space Models (SSMs) have recently enjoyed a rise to prominence in the field of deep learning for sequence modeling.
We show how selective time intervals in Mamba act as linear approximators of information.
We propose our SeRpEnt architecture, a SSM that further exploits selectivity to compress sequences in an information-aware fashion.
- Score: 5.7918134313332414
- License:
- Abstract: State Space Models (SSMs) have recently enjoyed a rise to prominence in the field of deep learning for sequence modeling, especially as an alternative to Transformers. Their success stems from avoiding two well-known drawbacks of attention-based models: quadratic complexity with respect to the sequence length and inability to model long-range dependencies. The SSM variant Mamba has demonstrated performance comparable to Transformers without any form of attention, thanks to the use of a selective mechanism for the state parameters. Selectivity, however, is only evaluated empirically and the reasons of its effectiveness remain unclear. In this work, we show how selectivity is related to the sequence processing. Our analysis shows that selective time intervals in Mamba act as linear approximators of information. Then, we propose our SeRpEnt architecture, a SSM that further exploits selectivity to compress sequences in an information-aware fashion. It employs a resampling mechanism that aggregates elements based on their information content. Our empirical results in the Long Range Arena benchmark and other language modeling tasks show benefits of the SeRpEnt's resampling mechanism.
Related papers
- On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages [56.22289522687125]
Selective state-space models (SSMs) are an emerging alternative to the Transformer.
We analyze their expressiveness and length generalization performance on regular language tasks.
We introduce the Selective Dense State-Space Model (SD-SSM), the first selective SSM that exhibits perfect length generalization.
arXiv Detail & Related papers (2024-12-26T20:53:04Z) - SIGMA: Selective Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.
We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.
Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - Longhorn: State Space Models are Amortized Online Learners [51.10124201221601]
State-space models (SSMs) offer linear decoding efficiency while maintaining parallelism during training.
In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems.
We introduce a novel deep SSM architecture, Longhorn, whose update resembles the closed-form solution for solving the online associative recall problem.
arXiv Detail & Related papers (2024-07-19T11:12:08Z) - Slot State Space Models [26.21351703553609]
We introduce SlotSSMs, a novel framework for incorporating independent mechanisms into SSMs to preserve or encourage separation of information.
In experiments, we evaluate our model in object-centric learning, 3D visual reasoning, and long-context video understanding tasks.
arXiv Detail & Related papers (2024-06-18T04:59:14Z) - MambaLRP: Explaining Selective State Space Sequence Models [18.133138020777295]
Recent sequence modeling approaches using selective state space sequence models, referred to as Mamba models, have seen a surge of interest.
These models allow efficient processing of long sequences in linear time and are rapidly being adopted in a wide range of applications such as language modeling.
To foster their reliable use in real-world scenarios, it is crucial to augment their transparency.
arXiv Detail & Related papers (2024-06-11T12:15:47Z) - LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory [63.41820940103348]
Self-attention mechanism's computational cost limits its practicality for long sequences.
We propose a new method called LongVQ to compress the global abstraction as a length-fixed codebook.
LongVQ effectively maintains dynamic global and local patterns, which helps to complement the lack of long-range dependency issues.
arXiv Detail & Related papers (2024-04-17T08:26:34Z) - The Hidden Attention of Mamba Models [54.50526986788175]
The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains.
We show that such models can be viewed as attention-driven models.
This new perspective enables us to empirically and theoretically compare the underlying mechanisms to that of the self-attention layers in transformers.
arXiv Detail & Related papers (2024-03-03T18:58:21Z) - Modeling Latent Selection with Structural Causal Models [1.4747234049753455]
We introduce a conditioning operation on Structural Causal Models (SCMs) to model latent selection from a causal perspective.
We show that the conditioning operation transforms an SCM with the presence of an explicit latent selection mechanism into an SCM without such selection mechanism.
arXiv Detail & Related papers (2024-01-12T23:14:34Z) - Mamba: Linear-Time Sequence Modeling with Selective State Spaces [31.985243136674146]
Foundation models are almost universally based on the Transformer architecture and its core attention module.
We identify that a key weakness of such models is their inability to perform content-based reasoning.
We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even blocks (Mamba)
As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics.
arXiv Detail & Related papers (2023-12-01T18:01:34Z) - SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive
Connection [51.376723069962]
We present a method for accelerating and structuring self-attentions: Sparse Adaptive Connection.
In SAC, we regard the input sequence as a graph and attention operations are performed between linked nodes.
We show that SAC is competitive with state-of-the-art models while significantly reducing memory cost.
arXiv Detail & Related papers (2020-03-22T07:58:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.