Related papers: Vision Mamba: A Comprehensive Survey and Taxonomy

Vision Mamba: A Comprehensive Survey and Taxonomy

URL: http://arxiv.org/abs/2405.04404v1
Date: Tue, 7 May 2024 15:30:14 GMT
Title: Vision Mamba: A Comprehensive Survey and Taxonomy
Authors: Xiao Liu, Chenxu Zhang, Lei Zhang,
Abstract summary: State Space Model (SSM) is a mathematical model used to describe and analyze the behavior of dynamic systems. Based on the latest state-space models, Mamba merges time-varying parameters into SSMs and formulates a hardware-aware algorithm for efficient training and inference. Mamba is expected to become a new AI architecture that may outperform Transformer.
Score: 11.025533218561284
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: State Space Model (SSM) is a mathematical model used to describe and analyze the behavior of dynamic systems. This model has witnessed numerous applications in several fields, including control theory, signal processing, economics and machine learning. In the field of deep learning, state space models are used to process sequence data, such as time series analysis, natural language processing (NLP) and video understanding. By mapping sequence data to state space, long-term dependencies in the data can be better captured. In particular, modern SSMs have shown strong representational capabilities in NLP, especially in long sequence modeling, while maintaining linear time complexity. Notably, based on the latest state-space models, Mamba merges time-varying parameters into SSMs and formulates a hardware-aware algorithm for efficient training and inference. Given its impressive efficiency and strong long-range dependency modeling capability, Mamba is expected to become a new AI architecture that may outperform Transformer. Recently, a number of works have attempted to study the potential of Mamba in various fields, such as general vision, multi-modal, medical image analysis and remote sensing image analysis, by extending Mamba from natural language domain to visual domain. To fully understand Mamba in the visual domain, we conduct a comprehensive survey and present a taxonomy study. This survey focuses on Mamba's application to a variety of visual tasks and data types, and discusses its predecessors, recent advances and far-reaching impact on a wide range of domains. Since Mamba is now on an upward trend, please actively notice us if you have new findings, and new progress on Mamba will be included in this survey in a timely manner and updated on the Mamba project at https://github.com/lx6c78/Vision-Mamba-A-Comprehensive-Survey-and-Taxonomy.

Related papers

DefMamba: Deformable Visual State Space Model [65.50381013020248]
We propose a novel visual foundation model called DefMamba. By combining a deformable scanning(DS) strategy, this model significantly improves its ability to learn image structures and detects changes in object details. Numerous experiments have shown that DefMamba achieves state-of-the-art performance in various visual tasks.
arXiv Detail & Related papers (2025-04-08T08:22:54Z)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision. In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z)
Mamba in Vision: A Comprehensive Survey of Techniques and Applications [3.4580301733198446]
Mamba is emerging as a novel approach to overcome the challenges faced by Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) in computer vision. Mamba addresses these limitations by leveraging Selective Structured State Space Models to effectively capture long-range dependencies with linear computational complexity.
arXiv Detail & Related papers (2024-10-04T02:58:49Z)
A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond [2.838321145442743]
Mamba is an alternative to template-based deep learning approaches in medical image analysis. It has linear time complexity, which is a significant improvement over transformers. Mamba processes longer sequences without attention mechanisms, enabling faster inference and requiring less memory.
arXiv Detail & Related papers (2024-10-03T10:23:03Z)
State-space models are accurate and efficient neural operators for dynamical systems [23.59679792068364]
Physics-informed machine learning (PIML) has emerged as a promising alternative to classical methods for predicting dynamical systems. Existing models, including recurrent neural networks (RNNs), transformers, and neural operators, face challenges such as long-time integration, long-range dependencies, chaotic dynamics, and extrapolation. This paper introduces state-space models implemented in Mamba for accurate and efficient dynamical system operator learning.
arXiv Detail & Related papers (2024-09-05T03:57:28Z)
Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction. We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation. Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z)
MambaVT: Spatio-Temporal Contextual Modeling for robust RGB-T Tracking [51.28485682954006]
We propose a pure Mamba-based framework (MambaVT) to fully exploit intrinsic-temporal contextual modeling for robust visible-thermal tracking. Specifically, we devise the long-range cross-frame integration component to globally adapt to target appearance variations. Experiments show the significant potential of vision Mamba for RGB-T tracking, with MambaVT achieving state-of-the-art performance on four mainstream benchmarks.
arXiv Detail & Related papers (2024-08-15T02:29:00Z)
A Survey of Mamba [27.939712558507516]
Recently, a novel architecture named Mamba has emerged as a promising alternative for building foundation models. This study investigates the advancements of Mamba-based models, the techniques of adapting Mamba to diverse data, and the applications where Mamba can excel.
arXiv Detail & Related papers (2024-08-02T09:18:41Z)
MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications. Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features. We conduct a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba.
arXiv Detail & Related papers (2024-07-10T23:02:45Z)
Visual Mamba: A Survey and New Outlooks [33.90213491829634]
Mamba, a recent selective structured state space model, excels in long sequence modeling. Since January 2024, Mamba has been actively applied to diverse computer vision tasks. This paper reviews visual Mamba approaches, analyzing over 200 papers.
arXiv Detail & Related papers (2024-04-29T16:51:30Z)
The Hidden Attention of Mamba Models [54.50526986788175]
The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains. We show that such models can be viewed as attention-driven models. This new perspective enables us to empirically and theoretically compare the underlying mechanisms to that of the self-attention layers in transformers.
arXiv Detail & Related papers (2024-03-03T18:58:21Z)
PointMamba: A Simple State Space Model for Point Cloud Analysis [65.59944745840866]
We propose PointMamba, transferring the success of Mamba, a recent representative state space model (SSM), from NLP to point cloud analysis tasks. Unlike traditional Transformers, PointMamba employs a linear complexity algorithm, presenting global modeling capacity while significantly reducing computational costs.
arXiv Detail & Related papers (2024-02-16T14:56:13Z)
Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining [85.08169822181685]
This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks. Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models.
arXiv Detail & Related papers (2024-02-05T18:58:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.