ReMamba: Equip Mamba with Effective Long-Sequence Modeling
- URL: http://arxiv.org/abs/2408.15496v4
- Date: Wed, 01 Jan 2025 15:22:34 GMT
- Title: ReMamba: Equip Mamba with Effective Long-Sequence Modeling
- Authors: Danlong Yuan, Jiahao Liu, Bei Li, Huishuai Zhang, Jingang Wang, Xunliang Cai, Dongyan Zhao,
- Abstract summary: We propose ReMamba, which enhances Mamba's ability to comprehend long contexts.
ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process.
- Score: 50.530839868893786
- License:
- Abstract: While the Mamba architecture demonstrates superior inference efficiency and competitive performance on short-context natural language processing (NLP) tasks, empirical evidence suggests its capacity to comprehend long contexts is limited compared to transformer-based models. In this study, we investigate the long-context efficiency issues of the Mamba models and propose ReMamba, which enhances Mamba's ability to comprehend long contexts. ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process, incurring minimal additional inference costs overhead. Experimental results on the LongBench and L-Eval benchmarks demonstrate ReMamba's efficacy, improving over the baselines by 3.2 and 1.6 points, respectively, and attaining performance almost on par with same-size transformer models.
Related papers
- From Markov to Laplace: How Mamba In-Context Learns Markov Chains [36.22373318908893]
We study in-context learning on Markov chains and uncover a surprising phenomenon.
Unlike transformers, even a single-layer Mamba efficiently learns the in-context Laplacian smoothing estimator.
These theoretical insights align strongly with empirical results and represent the first formal connection between Mamba and optimal statistical estimators.
arXiv Detail & Related papers (2025-02-14T14:13:55Z) - Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision.
In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z) - State Space Models are Strong Text Rerankers [33.41687512973575]
State space models (SSMs) like Mamba offer promising advantages.
Despite their potential, SSMs' effectiveness at text reranking remains underexplored.
Mamba architectures achieve competitive text ranking performance, comparable to transformer-based models of similar size.
arXiv Detail & Related papers (2024-12-18T21:42:15Z) - MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs.
We propose the MobileMamba framework, which balances efficiency and performance.
MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z) - Bi-Mamba: Towards Accurate 1-Bit State Space Models [28.478762133816726]
Bi-Mamba is a scalable and powerful 1-bit Mamba architecture designed for more efficient large language models.
Bi-Mamba achieves performance comparable to its full-precision counterparts (e.g., FP16 or BF16) and much better accuracy than post-training-binarization (PTB) Mamba baselines.
arXiv Detail & Related papers (2024-11-18T18:59:15Z) - Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba [77.21394300708172]
Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond.
The recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential.
This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of
arXiv Detail & Related papers (2024-06-24T15:27:21Z) - DeciMamba: Exploring the Length Extrapolation Potential of Mamba [89.07242846058023]
We introduce DeciMamba, a context-extension method specifically designed for Mamba.
Experiments over real-world long-range NLP tasks show that DeciMamba can extrapolate to context lengths significantly longer than the ones seen during training.
arXiv Detail & Related papers (2024-06-20T17:40:18Z) - Mamba State-Space Models Are Lyapunov-Stable Learners [1.6385815610837167]
Mamba state-space models (SSMs) were recently shown to outperform Transformer large language models (LLMs) across various tasks.
We show that Mamba's recurrent dynamics are robust to small input changes.
We also show that instruction tuning allows Mamba models to narrow this gap to 81% and Mamba-2 models to skyrocket over this gap to 132%.
arXiv Detail & Related papers (2024-05-31T21:46:23Z) - Is Mamba Capable of In-Context Learning? [63.682741783013306]
State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL)
This work provides empirical evidence that Mamba, a newly proposed state space model, has similar ICL capabilities.
arXiv Detail & Related papers (2024-02-05T16:39:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.