The Hidden Attention of Mamba Models
- URL: http://arxiv.org/abs/2403.01590v2
- Date: Sun, 31 Mar 2024 14:31:14 GMT
- Title: The Hidden Attention of Mamba Models
- Authors: Ameen Ali, Itamar Zimerman, Lior Wolf,
- Abstract summary: The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains.
We show that such models can be viewed as attention-driven models.
This new perspective enables us to empirically and theoretically compare the underlying mechanisms to that of the self-attention layers in transformers.
- Score: 54.50526986788175
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains, including NLP, long-range sequence processing, and computer vision. Selective SSMs are viewed as dual models, in which one trains in parallel on the entire sequence via an IO-aware parallel scan, and deploys in an autoregressive manner. We add a third view and show that such models can be viewed as attention-driven models. This new perspective enables us to empirically and theoretically compare the underlying mechanisms to that of the self-attention layers in transformers and allows us to peer inside the inner workings of the Mamba model with explainability methods. Our code is publicly available.
Related papers
- Scalable Autoregressive Image Generation with Mamba [23.027439743155192]
We introduce AiM, an autoregressive (AR) image generative model based on Mamba architecture.
Mamba is a novel state-space model characterized by its exceptional performance for long-sequence modeling with linear time.
We provide AiM models in various scales, with parameter counts ranging from 148M to 1.3B.
arXiv Detail & Related papers (2024-08-22T09:27:49Z) - Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction.
We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation.
Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z) - ML-Mamba: Efficient Multi-Modal Large Language Model Utilizing Mamba-2 [4.30176340351235]
We introduce ML-Mamba, a multimodal language model, which utilizes the latest and efficient Mamba-2 model for inference.
We replace the Transformer-based backbone with a pre-trained Mamba-2 model and explore methods for integrating 2D visual selective scanning mechanisms into multimodal learning.
arXiv Detail & Related papers (2024-07-29T09:38:15Z) - Longhorn: State Space Models are Amortized Online Learners [51.10124201221601]
State-space models (SSMs) offer linear decoding efficiency while maintaining parallelism during training.
In this work, we explore SSM design through the lens of online learning, conceptualizing SSMs as meta-modules for specific online learning problems.
We introduce a novel deep SSM architecture, Longhorn, whose update resembles the closed-form solution for solving the online associative recall problem.
arXiv Detail & Related papers (2024-07-19T11:12:08Z) - GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model [66.35608254724566]
State-space models (SSMs) have showcased effective performance in modeling long-range dependencies with subquadratic complexity.
However, pure SSM-based models still face challenges related to stability and achieving optimal performance on computer vision tasks.
Our paper addresses the challenges of scaling SSM-based models for computer vision, particularly the instability and inefficiency of large model sizes.
arXiv Detail & Related papers (2024-07-18T17:59:58Z) - MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications.
Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features.
We conduct a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba.
arXiv Detail & Related papers (2024-07-10T23:02:45Z) - MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection [5.37935922811333]
MambaMixer is a new architecture with data-dependent weights that uses a dual selection mechanism across tokens and channels.
As a proof of concept, we design Vision MambaMixer (ViM2) and Time Series MambaMixer (TSM2) architectures based on the MambaMixer block.
arXiv Detail & Related papers (2024-03-29T00:05:13Z) - VL-Mamba: Exploring State Space Models for Multimodal Learning [22.701028299912398]
In this work, we propose VL-Mamba, a multimodal large language model based on state space models.
Specifically, we first replace the transformer-based backbone language model such as LLama or Vicuna with the pre-trained Mamba language model.
arXiv Detail & Related papers (2024-03-20T13:48:50Z) - STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition [50.064502884594376]
We study the problem of human action recognition using motion capture (MoCap) sequences.
We propose a novel Spatial-Temporal Mesh Transformer (STMT) to directly model the mesh sequences.
The proposed method achieves state-of-the-art performance compared to skeleton-based and point-cloud-based models.
arXiv Detail & Related papers (2023-03-31T16:19:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.