Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for
Scribble-based Medical Image Segmentation
- URL: http://arxiv.org/abs/2402.10887v1
- Date: Fri, 16 Feb 2024 18:43:39 GMT
- Title: Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for
Scribble-based Medical Image Segmentation
- Authors: Ziyang Wang, Chao Ma
- Abstract summary: This paper introduces Weak-Mamba-UNet, an innovative weakly-supervised learning (WSL) framework for medical image segmentation.
WSL strategy incorporates three distinct architecture but same symmetrical encoder-decoder networks: a CNN-based UNet for detailed local feature extraction, a Swin Transformer-based SwinUNet for comprehensive global context understanding, and a VMamba-based Mamba-UNet for efficient long-range dependency modeling.
The effectiveness of Weak-Mamba-UNet is validated on a publicly available MRI cardiac segmentation dataset with processed annotations, where it surpasses the performance of a similar WSL
- Score: 13.748446415530937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical image segmentation is increasingly reliant on deep learning
techniques, yet the promising performance often come with high annotation
costs. This paper introduces Weak-Mamba-UNet, an innovative weakly-supervised
learning (WSL) framework that leverages the capabilities of Convolutional
Neural Network (CNN), Vision Transformer (ViT), and the cutting-edge Visual
Mamba (VMamba) architecture for medical image segmentation, especially when
dealing with scribble-based annotations. The proposed WSL strategy incorporates
three distinct architecture but same symmetrical encoder-decoder networks: a
CNN-based UNet for detailed local feature extraction, a Swin Transformer-based
SwinUNet for comprehensive global context understanding, and a VMamba-based
Mamba-UNet for efficient long-range dependency modeling. The key concept of
this framework is a collaborative and cross-supervisory mechanism that employs
pseudo labels to facilitate iterative learning and refinement across the
networks. The effectiveness of Weak-Mamba-UNet is validated on a publicly
available MRI cardiac segmentation dataset with processed scribble annotations,
where it surpasses the performance of a similar WSL framework utilizing only
UNet or SwinUNet. This highlights its potential in scenarios with sparse or
imprecise annotations. The source code is made publicly accessible.
Related papers
- MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications.
Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features.
We conduct a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba.
arXiv Detail & Related papers (2024-07-10T23:02:45Z) - Convolution and Attention-Free Mamba-based Cardiac Image Segmentation [0.508267104652645]
Convolutional Neural Networks (CNNs) and Transformer-based self-attention models have become standard for medical image segmentation.
We present a Convolution and self-Attention Free Mamba-based semantic Network named CAF-MambaSegNet.
Our goal is not to outperform state-of-the-art results but to show how this innovative, convolution and self-attention-free method can inspire further research.
arXiv Detail & Related papers (2024-06-09T13:53:05Z) - Towards Semantic Equivalence of Tokenization in Multimodal LLM [149.11720372278273]
Vision tokenization is essential for semantic alignment between vision and language.
This paper proposes a novel dynamic Semantic-Equivalent Vision Tokenizer (SeTok)
SeTok groups visual features into semantic units via a dynamic clustering algorithm.
The resulting vision tokens effectively preserve semantic integrity and capture both low-frequency and high-frequency visual features.
arXiv Detail & Related papers (2024-06-07T17:55:43Z) - Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion [0.0]
We introduce Mamba HUNet, a novel architecture tailored for robust and efficient segmentation tasks.
We first converted HUNet into a lighter version, maintaining performance parity and then integrated this lighter HUNet into Mamba HUNet, further enhancing its efficiency.
Experimental results on publicly available Magnetic Resonance Imaging scans, notably in Multiple Sclerosis lesion segmentation, demonstrate Mamba HUNet's effectiveness across diverse segmentation tasks.
arXiv Detail & Related papers (2024-03-26T06:57:50Z) - LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation [9.862277278217045]
In this paper, we introduce a Large Kernel Vision Mamba U-shape Network, or LKM-UNet, for medical image segmentation.
A distinguishing feature of our LKM-UNet is its utilization of large Mamba kernels, excelling in locally spatial modeling compared to small kernel-based CNNs and Transformers.
Comprehensive experiments demonstrate the feasibility and the effectiveness of using large-size Mamba kernels to achieve large receptive fields.
arXiv Detail & Related papers (2024-03-12T05:34:51Z) - Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation [11.637738540262797]
This paper introduces the Semi-Mamba-UNet, which integrates a visual mamba-based UNet architecture with a conventional UNet into a semi-supervised learning (SSL) framework.
Our comprehensive evaluation on a publicly available MRI cardiac segmentation dataset highlights the superior performance of Semi-Mamba-UNet.
arXiv Detail & Related papers (2024-02-11T17:09:21Z) - Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation [21.1787366866505]
We propose Mamba-UNet, a novel architecture that synergizes the U-Net in medical image segmentation with Mamba's capability.
Mamba-UNet adopts a pure Visual Mamba (VMamba)-based encoder-decoder structure, infused with skip connections to preserve spatial information across different scales of the network.
arXiv Detail & Related papers (2024-02-07T18:33:04Z) - Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining [85.08169822181685]
This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks.
Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models.
arXiv Detail & Related papers (2024-02-05T18:58:11Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z) - Towards Efficient Scene Understanding via Squeeze Reasoning [71.1139549949694]
We propose a novel framework called Squeeze Reasoning.
Instead of propagating information on the spatial map, we first learn to squeeze the input feature into a channel-wise global vector.
We show that our approach can be modularized as an end-to-end trained block and can be easily plugged into existing networks.
arXiv Detail & Related papers (2020-11-06T12:17:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.