Related papers: Block-Biased Mamba for Long-Range Sequence Processing

Block-Biased Mamba for Long-Range Sequence Processing

URL: http://arxiv.org/abs/2505.09022v1
Date: Tue, 13 May 2025 23:34:09 GMT
Title: Block-Biased Mamba for Long-Range Sequence Processing
Authors: Annan Yu, N. Benjamin Erichson,
Abstract summary: Mamba extends earlier state space models (SSMs) by introducing input-dependent dynamics.<n>Despite being built on architectures designed for long-range dependencies, Mamba performs poorly on long-range sequential tasks.<n>We propose a simple extension of Mamba's S6 unit that combines block-wise selective dynamics with a channel-specific bias.
Score: 8.988769052522807
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Mamba extends earlier state space models (SSMs) by introducing input-dependent dynamics, and has demonstrated strong empirical performance across a range of domains, including language modeling, computer vision, and foundation models. However, a surprising weakness remains: despite being built on architectures designed for long-range dependencies, Mamba performs poorly on long-range sequential tasks. Understanding and addressing this gap is important for improving Mamba's universality and versatility. In this work, we analyze Mamba's limitations through three perspectives: expressiveness, inductive bias, and training stability. Our theoretical results show how Mamba falls short in each of these aspects compared to earlier SSMs such as S4D. To address these issues, we propose $\text{B}_2\text{S}_6$, a simple extension of Mamba's S6 unit that combines block-wise selective dynamics with a channel-specific bias. We prove that these changes equip the model with a better-suited inductive bias and improve its expressiveness and stability. Empirically, $\text{B}_2\text{S}_6$ outperforms S4 and S4D on Long-Range Arena (LRA) tasks while maintaining Mamba's performance on language modeling benchmarks.

Related papers

Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection [88.47928738482719]
Linear State Space Models (SSMs) offer remarkable performance gains in sequence modeling.<n>Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations.<n>We introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.
arXiv Detail & Related papers (2025-06-22T19:26:55Z)
Understanding Input Selectivity in Mamba: Impact on Approximation Power, Memorization, and Associative Recall Capacity [5.116777508056307]
State-Space Models (SSMs) have recently emerged as a promising alternative to Transformers.<n>Mamba introduces input selectivity to its SSM layer (S6) and incorporates convolution and gating into its block definition.<n>We demystify the role of input selectivity in Mamba, investigating its impact on function approximation power, long-term memorization, and associative recall capabilities.
arXiv Detail & Related papers (2025-06-13T15:38:41Z)
Dynamic Vision Mamba [41.84910346271891]
Mamba-based vision models have gained extensive attention as a result of being computationally more efficient than attention-based models.<n>For token redundancy, we analytically find that early token pruning methods will result in inconsistency between training and inference.<n>For block redundancy, we allow each image to select SSM blocks dynamically based on an empirical observation that the inference speed of Mamba-based vision models is largely affected by the number of SSM blocks.
arXiv Detail & Related papers (2025-04-07T07:31:28Z)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision.<n>In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z)
Mamba-CL: Optimizing Selective State Space Model in Null Space for Continual Learning [54.19222454702032]
Continual Learning aims to equip AI models with the ability to learn a sequence of tasks over time, without forgetting previously learned knowledge. State Space Models (SSMs) have achieved notable success in computer vision. We introduce Mamba-CL, a framework that continuously fine-tunes the core SSMs of the large-scale Mamba foundation model.
arXiv Detail & Related papers (2024-11-23T06:36:16Z)
StableMamba: Distillation-free Scaling of Large SSMs for Images and Videos [27.604572990625144]
State-space models (SSMs) have introduced a novel context modeling method by integrating state-space techniques into deep learning.<n>Mamba-based architectures are difficult to scale with respect to the number of parameters, which is a major limitation for vision applications.<n>We propose a Mamba-Attention interleaved architecture that enhances scalability, robustness, and performance.
arXiv Detail & Related papers (2024-09-18T10:48:10Z)
ReMamba: Equip Mamba with Effective Long-Sequence Modeling [50.530839868893786]
We propose ReMamba, which enhances Mamba's ability to comprehend long contexts.<n>ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process.
arXiv Detail & Related papers (2024-08-28T02:47:27Z)
DeciMamba: Exploring the Length Extrapolation Potential of Mamba [89.07242846058023]
We introduce DeciMamba, a context-extension method specifically designed for Mamba.<n>Experiments over real-world long-range NLP tasks show that DeciMamba can extrapolate to context lengths significantly longer than the ones seen during training.
arXiv Detail & Related papers (2024-06-20T17:40:18Z)
MambaOut: Do We Really Need Mamba for Vision? [70.60495392198686]
Mamba, an architecture with RNN-like token mixer of state space model (SSM), was recently introduced to address the quadratic complexity of the attention mechanism. This paper conceptually concludes that Mamba is ideally suited for tasks with long-sequence and autoregressive characteristics. We construct a series of models named MambaOut through stacking Mamba blocks while removing their core token mixer, SSM.
arXiv Detail & Related papers (2024-05-13T17:59:56Z)
SiMBA: Simplified Mamba-Based Architecture for Vision and Multivariate Time series [2.4379295576598436]
We propose SiMBA, a new architecture that introduces Einstein FFT (EinFFT) for channel modeling by specific eigenvalue computations and uses the Mamba block for sequence modeling. We show that SiMBA outperforms existing SSMs, bridging the performance gap with state-of-the-art transformers.
arXiv Detail & Related papers (2024-03-22T17:22:56Z)
BlackMamba: Mixture of Experts for State-Space Models [10.209192169793772]
State-space models (SSMs) have recently demonstrated competitive performance to transformers at large-scale language modeling benchmarks. MoE models have shown remarkable performance while significantly reducing the compute and latency costs of inference. We present BlackMamba, a novel architecture that combines the Mamba SSM with MoE to obtain the benefits of both.
arXiv Detail & Related papers (2024-02-01T07:15:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.