Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining
- URL: http://arxiv.org/abs/2402.03302v2
- Date: Wed, 6 Mar 2024 13:29:29 GMT
- Title: Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining
- Authors: Jiarun Liu, Hao Yang, Hong-Yu Zhou, Yan Xi, Lequan Yu, Yizhou Yu, Yong
Liang, Guangming Shi, Shaoting Zhang, Hairong Zheng, Shanshan Wang
- Abstract summary: This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks.
Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models.
- Score: 85.08169822181685
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate medical image segmentation demands the integration of multi-scale
information, spanning from local features to global dependencies. However, it
is challenging for existing methods to model long-range global information,
where convolutional neural networks (CNNs) are constrained by their local
receptive fields, and vision transformers (ViTs) suffer from high quadratic
complexity of their attention mechanism. Recently, Mamba-based models have
gained great attention for their impressive ability in long sequence modeling.
Several studies have demonstrated that these models can outperform popular
vision models in various tasks, offering higher accuracy, lower memory
consumption, and less computational burden. However, existing Mamba-based
models are mostly trained from scratch and do not explore the power of
pretraining, which has been proven to be quite effective for data-efficient
medical image analysis. This paper introduces a novel Mamba-based model,
Swin-UMamba, designed specifically for medical image segmentation tasks,
leveraging the advantages of ImageNet-based pretraining. Our experimental
results reveal the vital role of ImageNet-based training in enhancing the
performance of Mamba-based models. Swin-UMamba demonstrates superior
performance with a large margin compared to CNNs, ViTs, and latest Mamba-based
models. Notably, on AbdomenMRI, Encoscopy, and Microscopy datasets, Swin-UMamba
outperforms its closest counterpart U-Mamba_Enc by an average score of 2.72%.
Related papers
- A Comprehensive Survey of Mamba Architectures for Medical Image Analysis: Classification, Segmentation, Restoration and Beyond [2.838321145442743]
Mamba is an alternative to template-based deep learning approaches in medical image analysis.
It has linear time complexity, which is a significant improvement over transformers.
Mamba processes longer sequences without attention mechanisms, enabling faster inference and requiring less memory.
arXiv Detail & Related papers (2024-10-03T10:23:03Z) - ReMamba: Equip Mamba with Effective Long-Sequence Modeling [50.530839868893786]
We propose ReMamba, which enhances Mamba's ability to comprehend long contexts.
ReMamba incorporates selective compression and adaptation techniques within a two-stage re-forward process.
arXiv Detail & Related papers (2024-08-28T02:47:27Z) - LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba [54.85262314960038]
Local Attentional Mamba blocks capture both global contexts and local details with linear complexity.
Our model exhibits exceptional scalability and surpasses the performance of DiT across various model scales on ImageNet at 256x256 resolution.
Compared to state-of-the-art diffusion models on ImageNet 256x256 and 512x512, our largest model presents notable advantages, such as a reduction of up to 62% GFLOPs.
arXiv Detail & Related papers (2024-08-05T16:39:39Z) - MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications.
Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features.
We conduct a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba.
arXiv Detail & Related papers (2024-07-10T23:02:45Z) - Vision Mamba for Classification of Breast Ultrasound Images [9.90112908284836]
Mamba-based models, VMamba and Vim, are a recent family of vision encoders that offer promising performance improvements in many computer vision tasks.
This paper compares Mamba-based models with traditional Convolutional Neural Networks (CNNs) and Vision Transformers (ViTs) using the breast ultrasound BUSI dataset and Breast Ultrasound B dataset.
arXiv Detail & Related papers (2024-07-04T00:21:47Z) - On Evaluating Adversarial Robustness of Volumetric Medical Segmentation Models [59.45628259925441]
Volumetric medical segmentation models have achieved significant success on organ and tumor-based segmentation tasks.
Their vulnerability to adversarial attacks remains largely unexplored.
This underscores the importance of investigating the robustness of existing models.
arXiv Detail & Related papers (2024-06-12T17:59:42Z) - DEEM: Diffusion Models Serve as the Eyes of Large Language Models for Image Perception [66.88792390480343]
We propose DEEM, a simple but effective approach that utilizes the generative feedback of diffusion models to align the semantic distributions of the image encoder.
DEEM exhibits enhanced robustness and a superior capacity to alleviate model hallucinations while utilizing fewer trainable parameters, less pre-training data, and a smaller base model size.
arXiv Detail & Related papers (2024-05-24T05:46:04Z) - CLIP-Mamba: CLIP Pretrained Mamba Models with OOD and Hessian Evaluation [18.383760896304604]
This report introduces the first attempt to train a Mamba model utilizing contrastive technical-image pretraining (CLIP)
A Mamba model 67 million parameters is on par with a 307 million- parameters Vision Transformer (ViT) model in zero-shot classification tasks.
arXiv Detail & Related papers (2024-04-30T09:40:07Z) - LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation [9.862277278217045]
In this paper, we introduce a Large Kernel Vision Mamba U-shape Network, or LKM-UNet, for medical image segmentation.
A distinguishing feature of our LKM-UNet is its utilization of large Mamba kernels, excelling in locally spatial modeling compared to small kernel-based CNNs and Transformers.
Comprehensive experiments demonstrate the feasibility and the effectiveness of using large-size Mamba kernels to achieve large receptive fields.
arXiv Detail & Related papers (2024-03-12T05:34:51Z) - MedMamba: Vision Mamba for Medical Image Classification [0.0]
Vision transformers (ViTs) and convolutional neural networks (CNNs) have been extensively studied and widely used in medical image classification tasks.
Recent studies have shown that state space models (SSMs) represented by Mamba can effectively model long-range dependencies.
We propose MedMamba, the first Vision Mamba for generalized medical image classification.
arXiv Detail & Related papers (2024-03-06T16:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.