RNN as Linear Transformer: A Closer Investigation into Representational Potentials of Visual Mamba Models
- URL: http://arxiv.org/abs/2511.18380v1
- Date: Sun, 23 Nov 2025 09:57:27 GMT
- Title: RNN as Linear Transformer: A Closer Investigation into Representational Potentials of Visual Mamba Models
- Authors: Timing Yang, Guoyizhe Wei, Alan Yuille, Feng Wang,
- Abstract summary: Mamba has recently garnered attention as an effective backbone for vision tasks.<n>We make three primary contributions to investigate Mamba's representational properties.<n>Our model achieves a 78.5 percent linear probing accuracy on ImageNet, underscoring its strong performance.
- Score: 8.049668552887505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mamba has recently garnered attention as an effective backbone for vision tasks. However, its underlying mechanism in visual domains remains poorly understood. In this work, we systematically investigate Mamba's representational properties and make three primary contributions. First, we theoretically analyze Mamba's relationship to Softmax and Linear Attention, confirming that it can be viewed as a low-rank approximation of Softmax Attention and thereby bridging the representational gap between Softmax and Linear forms. Second, we introduce a novel binary segmentation metric for activation map evaluation, extending qualitative assessments to a quantitative measure that demonstrates Mamba's capacity to model long-range dependencies. Third, by leveraging DINO for self-supervised pretraining, we obtain clearer activation maps than those produced by standard supervised approaches, highlighting Mamba's potential for interpretability. Notably, our model also achieves a 78.5 percent linear probing accuracy on ImageNet, underscoring its strong performance. We hope this work can provide valuable insights for future investigations of Mamba-based vision architectures.
Related papers
- Can Mamba Learn In Context with Outliers? A Theoretical Generalization Analysis [88.05636819649804]
The Mamba model has gained significant attention for its computational advantages over Transformer-based models.<n>This paper presents the first theoretical analysis of the training dynamics of a one-layer Mamba model.<n>We show that although Mamba may require more training to converge, it maintains accurate predictions even when the proportion of outliers exceeds the threshold that a linear Transformer can tolerate.
arXiv Detail & Related papers (2025-10-01T01:25:01Z) - Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision.<n>In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z) - ReMamba: Equip Mamba with Effective Long-Sequence Modeling [50.530839868893786]
We propose ReMamba, which enhances Mamba's ability to comprehend long contexts.<n>ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process.
arXiv Detail & Related papers (2024-08-28T02:47:27Z) - Neural Architecture Search based Global-local Vision Mamba for Palm-Vein Recognition [42.4241558556591]
We propose a hybrid network structure named Global-local Vision Mamba (GLVM) to learn the local correlations in images explicitly and global dependencies among tokens for vein feature representation.
Thirdly, to learn the complementary features, we propose a ConvMamba block consisting of three branches, named Multi-head Mamba branch (MHMamba), Feature Iteration Unit branch (FIU), and Convolutional Neural Network (CNN) branch.
Finally, a Globallocal Alternate Neural Architecture Search (GLNAS) method is proposed to search the optimal architecture of GLVM alternately with the evolutionary algorithm.
arXiv Detail & Related papers (2024-08-11T10:42:22Z) - A Survey of Mamba [27.939712558507516]
Recently, a novel architecture named Mamba has emerged as a promising alternative for building foundation models.<n>This study investigates the advancements of Mamba-based models, the techniques of adapting Mamba to diverse data, and the applications where Mamba can excel.
arXiv Detail & Related papers (2024-08-02T09:18:41Z) - MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, MambaVision, specifically tailored for vision applications.<n>We show that equipping the Mamba architecture with self-attention blocks in the final layers greatly improves its capacity to capture long-range spatial dependencies.<n>For classification on the ImageNet-1K dataset, MambaVision variants achieve state-of-the-art (SOTA) performance in terms of both Top-1 accuracy and throughput.
arXiv Detail & Related papers (2024-07-10T23:02:45Z) - Vision Mamba for Classification of Breast Ultrasound Images [9.90112908284836]
Mamba-based models, VMamba and Vim, are a recent family of vision encoders that offer promising performance improvements in many computer vision tasks.
This paper compares Mamba-based models with traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) using the breast ultrasound BUSI dataset and Breast Ultrasound B dataset.
arXiv Detail & Related papers (2024-07-04T00:21:47Z) - Demystify Mamba in Vision: A Linear Attention Perspective [72.93213667713493]
Mamba is an effective state space model with linear computation complexity.<n>We show that Mamba shares surprising similarities with linear attention Transformer.<n>We propose a Mamba-Inspired Linear Attention (MILA) model by incorporating the merits of these two key designs into linear attention.
arXiv Detail & Related papers (2024-05-26T15:31:09Z) - Vision Mamba: A Comprehensive Survey and Taxonomy [11.025533218561284]
State Space Model (SSM) is a mathematical model used to describe and analyze the behavior of dynamic systems.
Based on the latest state-space models, Mamba merges time-varying parameters into SSMs and formulates a hardware-aware algorithm for efficient training and inference.
Mamba is expected to become a new AI architecture that may outperform Transformer.
arXiv Detail & Related papers (2024-05-07T15:30:14Z) - Visual Mamba: A Survey and New Outlooks [33.90213491829634]
Mamba, a recent selective structured state space model, excels in long sequence modeling.
Since January 2024, Mamba has been actively applied to diverse computer vision tasks.
This paper reviews visual Mamba approaches, analyzing over 200 papers.
arXiv Detail & Related papers (2024-04-29T16:51:30Z) - Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining [85.08169822181685]
This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks.
Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models.
arXiv Detail & Related papers (2024-02-05T18:58:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.