Related papers: Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba

Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba

URL: http://arxiv.org/abs/2506.18184v1
Date: Sun, 22 Jun 2025 21:52:45 GMT
Title: Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba
Authors: Donghyun Lee, Yuhang Li, Ruokai Yin, Shiting Xiao, Priyadarshini Panda,
Abstract summary: State Space Models (SSMs) have emerged as powerful alternatives to attention-based Transformers.<n>Memba is a membrane-driven.<n>PEFT approach specifically designed for Mamba.
Score: 21.474315621757594
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: State Space Models (SSMs) have emerged as powerful alternatives to attention-based Transformers, with Mamba demonstrating impressive efficiency and scalability. As these models grow increasingly larger, the need for Parameter-Efficient Fine-Tuning (PEFT) methods becomes critical to adapt pre-trained Mamba to downstream tasks without prohibitive computational costs. However, previous approaches simply apply traditional Transformer-tailored PEFT methods without addressing the unique temporal processing dynamics of SSMs. To address this limitation, we propose Memba, a membrane-driven PEFT approach specifically designed for Mamba. Memba introduces Leaky Integrate Membrane (LIM) neurons as bio-inspired gating mechanisms that naturally accumulate membrane potentials over time, enhancing selective information retention. By strategically combining LIM neurons with Low-Rank Adaptations (LoRA) and cross-layer membrane transfer, our approach significantly improves Mamba's temporal modeling capabilities. Extensive experiments across language and vision tasks demonstrate that Memba achieves substantial improvements over existing PEFT methods. The code is available at https://github.com/Intelligent-Computing-Lab-Yale/Memba.

Related papers

Differential Mamba [16.613266337054267]
Sequence models like Transformers and RNNs often overallocate attention to irrelevant context, leading to noisy intermediate representations.<n>Recent work has shown that differential design can mitigate this issue in Transformers, improving their effectiveness across various applications.<n>We show that a naive adaptation of differential design to Mamba is insufficient and requires careful architectural modifications.
arXiv Detail & Related papers (2025-07-08T17:30:14Z)
Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection [88.47928738482719]
Linear State Space Models (SSMs) offer remarkable performance gains in sequence modeling.<n>Recent advances, such as Mamba, further enhance SSMs with input-dependent gating and hardware-aware implementations.<n>We introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.
arXiv Detail & Related papers (2025-06-22T19:26:55Z)
Reparameterized LLM Training via Orthogonal Equivalence Transformation [54.80172809738605]
We present POET, a novel training algorithm that uses Orthogonal Equivalence Transformation to optimize neurons.<n>POET can stably optimize the objective function with improved generalization.<n>We develop efficient approximations that make POET flexible and scalable for training large-scale neural networks.
arXiv Detail & Related papers (2025-06-09T17:59:34Z)
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs. We propose the MobileMamba framework, which balances efficiency and performance. MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z)
MambaPEFT: Exploring Parameter-Efficient Fine-Tuning for Mamba [0.5530212768657544]
Mamba, a State Space Model (SSM)-based model, has attracted attention as a potential alternative to Transformers.<n>We investigate the effectiveness of existing PEFT methods for Transformers when applied to Mamba.<n>We propose new Mamba-specific PEFT methods that leverage the distinctive structure of Mamba.
arXiv Detail & Related papers (2024-11-06T11:57:55Z)
FMamba: Mamba based on Fast-attention for Multivariate Time-series Forecasting [6.152779144421304]
We introduce a novel framework named FMamba for multivariate time-series forecasting (MTSF) Technically, we first extract the temporal features of the input variables through an embedding layer, then compute the dependencies among input variables via the fast-attention module. We use Mamba to selectively deal with the input features and further extract the temporal dependencies of the variables through the multi-layer perceptron block (MLP-block) Finally, FMamba obtains the predictive results through the projector, a linear layer.
arXiv Detail & Related papers (2024-07-20T09:14:05Z)
SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning [63.93193829913252]
We propose an innovative METL strategy called SHERL for resource-limited scenarios. In the early route, intermediate outputs are consolidated via an anti-redundancy operation. In the late route, utilizing minimal late pre-trained layers could alleviate the peak demand on memory overhead.
arXiv Detail & Related papers (2024-07-10T10:22:35Z)
DeciMamba: Exploring the Length Extrapolation Potential of Mamba [89.07242846058023]
We introduce DeciMamba, a context-extension method specifically designed for Mamba.<n>Experiments over real-world long-range NLP tasks show that DeciMamba can extrapolate to context lengths significantly longer than the ones seen during training.
arXiv Detail & Related papers (2024-06-20T17:40:18Z)
Mamba State-Space Models Are Lyapunov-Stable Learners [1.6385815610837167]
Mamba state-space models (SSMs) were recently shown to outperform Transformer large language models (LLMs) across various tasks. We show that Mamba's recurrent dynamics are robust to small input changes. We also show that instruction tuning allows Mamba models to narrow this gap to 81% and Mamba-2 models to skyrocket over this gap to 132%.
arXiv Detail & Related papers (2024-05-31T21:46:23Z)
Mamba: Linear-Time Sequence Modeling with Selective State Spaces [31.985243136674146]
Foundation models are almost universally based on the Transformer architecture and its core attention module. We identify that a key weakness of such models is their inability to perform content-based reasoning. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even blocks (Mamba) As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics.
arXiv Detail & Related papers (2023-12-01T18:01:34Z)
MoEfication: Conditional Computation of Transformer Models for Efficient Inference [66.56994436947441]
Transformer-based pre-trained language models can achieve superior performance on most NLP tasks due to large parameter capacity, but also lead to huge computation cost. We explore to accelerate large-model inference by conditional computation based on the sparse activation phenomenon. We propose to transform a large model into its mixture-of-experts (MoE) version with equal model size, namely MoEfication.
arXiv Detail & Related papers (2021-10-05T02:14:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.