Related papers: ReMamba: Equip Mamba with Effective Long-Sequence Modeling

Related papers

Trained Mamba Emulates Online Gradient Descent in In-Context Linear Regression [90.93281146423378]
Mamba is an efficient Transformer alternative with linear complexity for long-sequence modeling.<n>Recent empirical works demonstrate Mamba's in-context learning (ICL) competitive with Transformers.<n>This paper studies the training dynamics of Mamba on the linear regression ICL task.
arXiv Detail & Related papers (2025-09-28T09:48:49Z)
Differential Mamba [16.613266337054267]
Sequence models like Transformers and RNNs often overallocate attention to irrelevant context, leading to noisy intermediate representations.<n>Recent work has shown that differential design can mitigate this issue in Transformers, improving their effectiveness across various applications.<n>We show that a naive adaptation of differential design to Mamba is insufficient and requires careful architectural modifications.
arXiv Detail & Related papers (2025-07-08T17:30:14Z)
Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection [88.47928738482719]
Linear State Space Models (SSMs) offer remarkable performance gains in sequence modeling.<n>Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations.<n>We introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.
arXiv Detail & Related papers (2025-06-22T19:26:55Z)
LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models [1.249658136570244]
State space models (SSMs) have emerged as an efficient alternative to transformers for long-context sequence modeling. SSMs lack the interpretability tools that have been crucial for understanding and improving attention-based architectures. We introduce LaTIM, a novel token-level decomposition method for both Mamba-1 and Mamba-2 that enables fine-grained interpretability.
arXiv Detail & Related papers (2025-02-21T17:33:59Z)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision. In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z)
State Space Models are Strong Text Rerankers [33.41687512973575]
State space models (SSMs) like Mamba offer promising advantages. Despite their potential, SSMs' effectiveness at text reranking remains underexplored. Mamba architectures achieve competitive text ranking performance, comparable to transformer-based models of similar size.
arXiv Detail & Related papers (2024-12-18T21:42:15Z)
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs. We propose the MobileMamba framework, which balances efficiency and performance. MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z)
Bi-Mamba: Towards Accurate 1-Bit State Space Models [28.478762133816726]
Bi-Mamba is a scalable and powerful 1-bit Mamba architecture designed for more efficient large language models. Bi-Mamba achieves performance comparable to its full-precision counterparts (e.g., FP16 or BF16) and much better accuracy than post-training-binarization (PTB) Mamba baselines.
arXiv Detail & Related papers (2024-11-18T18:59:15Z)
MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba [0.5530212768657544]
Mamba, a State Space Model (SSM)-based model, has attracted attention as a potential alternative to Transformers. We investigate the effectiveness of existing PEFT methods for Transformers when applied to Mamba. We propose new Mamba-specific PEFT methods that leverage the distinctive structure of Mamba.
arXiv Detail & Related papers (2024-11-06T11:57:55Z)
MAP: Unleashing Hybrid Mamba-Transformer Vision Backbone's Potential with Masked Autoregressive Pretraining [23.37555991996508]
We propose Masked Autoregressive Pretraining (MAP) to pretrain a hybrid Mamba-Transformer vision backbone network. We show that both the pure Mamba architecture and the hybrid Mamba-Transformer vision backbone network pretrained with MAP significantly outperform other pretraining strategies.
arXiv Detail & Related papers (2024-10-01T17:05:08Z)
Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction. We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation. Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z)
MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, MambaVision, specifically tailored for vision applications. We show that equipping the Mamba architecture with self-attention blocks in the final layers greatly improves its capacity to capture long-range spatial dependencies. For classification on the ImageNet-1K dataset, MambaVision variants achieve state-of-the-art (SOTA) performance in terms of both Top-1 accuracy and throughput.
arXiv Detail & Related papers (2024-07-10T23:02:45Z)
Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba [77.21394300708172]
Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. The recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of
arXiv Detail & Related papers (2024-06-24T15:27:21Z)
DeciMamba: Exploring the Length Extrapolation Potential of Mamba [89.07242846058023]
We introduce DeciMamba, a context-extension method specifically designed for Mamba. We show that DeciMamba can extrapolate context lengths 25x longer than the ones seen during training, and does so without utilizing additional computational resources.
arXiv Detail & Related papers (2024-06-20T17:40:18Z)
MambaLRP: Explaining Selective State Space Sequence Models [18.133138020777295]
Recent sequence modeling approaches using selective state space sequence models, referred to as Mamba models, have seen a surge of interest. These models allow efficient processing of long sequences in linear time and are rapidly being adopted in a wide range of applications such as language modeling. To foster their reliable use in real-world scenarios, it is crucial to augment their transparency.
arXiv Detail & Related papers (2024-06-11T12:15:47Z)
Mamba State-Space Models Are Lyapunov-Stable Learners [1.6385815610837167]
Mamba state-space models (SSMs) were recently shown to outperform Transformer large language models (LLMs) across various tasks. We show that Mamba's recurrent dynamics are robust to small input changes. We also show that instruction tuning allows Mamba models to narrow this gap to 81% and Mamba-2 models to skyrocket over this gap to 132%.
arXiv Detail & Related papers (2024-05-31T21:46:23Z)
Is Mamba Capable of In-Context Learning? [63.682741783013306]
State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL) This work provides empirical evidence that Mamba, a newly proposed state space model, has similar ICL capabilities.
arXiv Detail & Related papers (2024-02-05T16:39:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.