Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba
- URL: http://arxiv.org/abs/2406.16722v1
- Date: Mon, 24 Jun 2024 15:27:21 GMT
- Title: Venturing into Uncharted Waters: The Navigation Compass from Transformer to Mamba
- Authors: Yuchen Zou, Yineng Chen, Zuchao Li, Lefei Zhang, Hai Zhao,
- Abstract summary: Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond.
The recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential.
This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of
- Score: 77.21394300708172
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Transformer, a deep neural network architecture, has long dominated the field of natural language processing and beyond. Nevertheless, the recent introduction of Mamba challenges its supremacy, sparks considerable interest among researchers, and gives rise to a series of Mamba-based models that have exhibited notable potential. This survey paper orchestrates a comprehensive discussion, diving into essential research dimensions, covering: (i) the functioning of the Mamba mechanism and its foundation on the principles of structured state space models; (ii) the proposed improvements and the integration of Mamba with various networks, exploring its potential as a substitute for Transformers; (iii) the combination of Transformers and Mamba to compensate for each other's shortcomings. We have also made efforts to interpret Mamba and Transformer in the framework of kernel functions, allowing for a comparison of their mathematical nature within a unified context. Our paper encompasses the vast majority of improvements related to Mamba to date.
Related papers
- From Markov to Laplace: How Mamba In-Context Learns Markov Chains [36.22373318908893]
We study in-context learning on Markov chains and uncover a surprising phenomenon.
Unlike transformers, even a single-layer Mamba efficiently learns the in-context Laplacian smoothing estimator.
These theoretical insights align strongly with empirical results and represent the first formal connection between Mamba and optimal statistical estimators.
arXiv Detail & Related papers (2025-02-14T14:13:55Z) - MatIR: A Hybrid Mamba-Transformer Image Restoration Model [95.17418386046054]
We propose a Mamba-Transformer hybrid image restoration model called MatIR.
MatIR cross-cycles the blocks of the Transformer layer and the Mamba layer to extract features.
In the Mamba module, we introduce the Image Inpainting State Space (IRSS) module, which traverses along four scan paths.
arXiv Detail & Related papers (2025-01-30T14:55:40Z) - Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision.
In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z) - ReMamba: Equip Mamba with Effective Long-Sequence Modeling [50.530839868893786]
We propose ReMamba, which enhances Mamba's ability to comprehend long contexts.
ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process.
arXiv Detail & Related papers (2024-08-28T02:47:27Z) - A Survey of Mamba [27.939712558507516]
Recently, a novel architecture named Mamba has emerged as a promising alternative for building foundation models.
This study investigates the advancements of Mamba-based models, the techniques of adapting Mamba to diverse data, and the applications where Mamba can excel.
arXiv Detail & Related papers (2024-08-02T09:18:41Z) - Mamba meets crack segmentation [0.18416014644193066]
Cracks pose safety risks to infrastructure and cannot be overlooked.
CNNs exhibit a deficiency in global modeling capability, hindering the representation to entire crack features.
This study explores the representation capabilities of Mamba to crack features.
arXiv Detail & Related papers (2024-07-22T15:21:35Z) - An Empirical Study of Mamba-based Pedestrian Attribute Recognition [15.752464463535178]
This paper designs and adapts Mamba into two typical PAR frameworks, text-image fusion approach and pure vision Mamba multi-label recognition framework.
It is found that interacting with attribute tags as additional input does not always lead to an improvement, specifically, Vim can be enhanced, but VMamba cannot.
These experimental results indicate that simply enhancing Mamba with a Transformer does not always lead to performance improvements but yields better results under certain settings.
arXiv Detail & Related papers (2024-07-15T00:48:06Z) - Demystify Mamba in Vision: A Linear Attention Perspective [72.93213667713493]
Mamba is an effective state space model with linear computation complexity.
We show that Mamba shares surprising similarities with linear attention Transformer.
We propose a Mamba-Inspired Linear Attention (MILA) model by incorporating the merits of these two key designs into linear attention.
arXiv Detail & Related papers (2024-05-26T15:31:09Z) - Visual Mamba: A Survey and New Outlooks [33.90213491829634]
Mamba, a recent selective structured state space model, excels in long sequence modeling.
Since January 2024, Mamba has been actively applied to diverse computer vision tasks.
This paper reviews visual Mamba approaches, analyzing over 200 papers.
arXiv Detail & Related papers (2024-04-29T16:51:30Z) - Is Mamba Capable of In-Context Learning? [63.682741783013306]
State of the art foundation models such as GPT-4 perform surprisingly well at in-context learning (ICL)
This work provides empirical evidence that Mamba, a newly proposed state space model, has similar ICL capabilities.
arXiv Detail & Related papers (2024-02-05T16:39:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.