Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for
Scribble-based Medical Image Segmentation
- URL: http://arxiv.org/abs/2402.10887v1
- Date: Fri, 16 Feb 2024 18:43:39 GMT
- Title: Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for
Scribble-based Medical Image Segmentation
- Authors: Ziyang Wang, Chao Ma
- Abstract summary: This paper introduces Weak-Mamba-UNet, an innovative weakly-supervised learning (WSL) framework for medical image segmentation.
WSL strategy incorporates three distinct architecture but same symmetrical encoder-decoder networks: a CNN-based UNet for detailed local feature extraction, a Swin Transformer-based SwinUNet for comprehensive global context understanding, and a VMamba-based Mamba-UNet for efficient long-range dependency modeling.
The effectiveness of Weak-Mamba-UNet is validated on a publicly available MRI cardiac segmentation dataset with processed annotations, where it surpasses the performance of a similar WSL
- Score: 13.748446415530937
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical image segmentation is increasingly reliant on deep learning
techniques, yet the promising performance often come with high annotation
costs. This paper introduces Weak-Mamba-UNet, an innovative weakly-supervised
learning (WSL) framework that leverages the capabilities of Convolutional
Neural Network (CNN), Vision Transformer (ViT), and the cutting-edge Visual
Mamba (VMamba) architecture for medical image segmentation, especially when
dealing with scribble-based annotations. The proposed WSL strategy incorporates
three distinct architecture but same symmetrical encoder-decoder networks: a
CNN-based UNet for detailed local feature extraction, a Swin Transformer-based
SwinUNet for comprehensive global context understanding, and a VMamba-based
Mamba-UNet for efficient long-range dependency modeling. The key concept of
this framework is a collaborative and cross-supervisory mechanism that employs
pseudo labels to facilitate iterative learning and refinement across the
networks. The effectiveness of Weak-Mamba-UNet is validated on a publicly
available MRI cardiac segmentation dataset with processed scribble annotations,
where it surpasses the performance of a similar WSL framework utilizing only
UNet or SwinUNet. This highlights its potential in scenarios with sparse or
imprecise annotations. The source code is made publicly accessible.
Related papers
- Towards Open-Vocabulary Semantic Segmentation Without Semantic Labels [53.8817160001038]
We propose a novel method, PixelCLIP, to adapt the CLIP image encoder for pixel-level understanding.
To address the challenges of leveraging masks without semantic labels, we devise an online clustering algorithm.
PixelCLIP shows significant performance improvements over CLIP and competitive results compared to caption-supervised methods.
arXiv Detail & Related papers (2024-09-30T01:13:03Z) - Neural Architecture Search based Global-local Vision Mamba for Palm-Vein Recognition [42.4241558556591]
We propose a hybrid network structure named Global-local Vision Mamba (GLVM) to learn the local correlations in images explicitly and global dependencies among tokens for vein feature representation.
Thirdly, to learn the complementary features, we propose a ConvMamba block consisting of three branches, named Multi-head Mamba branch (MHMamba), Feature Iteration Unit branch (FIU), and Convolutional Neural Network (CNN) branch.
Finally, a Globallocal Alternate Neural Architecture Search (GLNAS) method is proposed to search the optimal architecture of GLVM alternately with the evolutionary algorithm.
arXiv Detail & Related papers (2024-08-11T10:42:22Z) - MambaVision: A Hybrid Mamba-Transformer Vision Backbone [54.965143338206644]
We propose a novel hybrid Mamba-Transformer backbone, denoted as MambaVision, which is specifically tailored for vision applications.
Our core contribution includes redesigning the Mamba formulation to enhance its capability for efficient modeling of visual features.
We conduct a comprehensive ablation study on the feasibility of integrating Vision Transformers (ViT) with Mamba.
arXiv Detail & Related papers (2024-07-10T23:02:45Z) - CAMS: Convolution and Attention-Free Mamba-based Cardiac Image Segmentation [0.508267104652645]
Convolutional Neural Networks (CNNs) and Transformer-based self-attention models have become the standard for medical image segmentation.
We present a Convolution and self-attention-free Mamba-based semantic Network named CAMS-Net.
Our model outperforms the existing state-of-the-art CNN, self-attention, and Mamba-based methods on CMR and M&Ms-2 Cardiac segmentation datasets.
arXiv Detail & Related papers (2024-06-09T13:53:05Z) - Towards Semantic Equivalence of Tokenization in Multimodal LLM [149.11720372278273]
Vision tokenization is essential for semantic alignment between vision and language.
This paper proposes a novel dynamic Semantic-Equivalent Vision Tokenizer (SeTok)
SeTok groups visual features into semantic units via a dynamic clustering algorithm.
The resulting vision tokens effectively preserve semantic integrity and capture both low-frequency and high-frequency visual features.
arXiv Detail & Related papers (2024-06-07T17:55:43Z) - MedMamba: Vision Mamba for Medical Image Classification [0.0]
Vision transformers (ViTs) and convolutional neural networks (CNNs) have been extensively studied and widely used in medical image classification tasks.
Recent studies have shown that state space models (SSMs) represented by Mamba can effectively model long-range dependencies.
We propose MedMamba, the first Vision Mamba for generalized medical image classification.
arXiv Detail & Related papers (2024-03-06T16:49:33Z) - Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation [11.637738540262797]
This study introduces Semi-Mamba-UNet, which integrates a purely visual Mamba-based encoder-decoder architecture with a conventional CNN-based UNet into a semi-supervised learning framework.
This innovative SSL approach leverages both networks to generate pseudo-labels and cross-supervise one another at the pixel level simultaneously.
We introduce a self-supervised pixel-level contrastive learning strategy that employs a pair of projectors to enhance the feature learning capabilities further.
arXiv Detail & Related papers (2024-02-11T17:09:21Z) - Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation [21.1787366866505]
We propose Mamba-UNet, a novel architecture that synergizes the U-Net in medical image segmentation with Mamba's capability.
Mamba-UNet adopts a pure Visual Mamba (VMamba)-based encoder-decoder structure, infused with skip connections to preserve spatial information across different scales of the network.
arXiv Detail & Related papers (2024-02-07T18:33:04Z) - Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining [85.08169822181685]
This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks.
Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models.
arXiv Detail & Related papers (2024-02-05T18:58:11Z) - Vivim: a Video Vision Mamba for Medical Video Segmentation [52.11785024350253]
This paper presents a Video Vision Mamba-based framework, dubbed as Vivim, for medical video segmentation tasks.
Our Vivim can effectively compress the long-term representation into sequences at varying scales.
Experiments on thyroid segmentation, breast lesion segmentation in ultrasound videos, and polyp segmentation in colonoscopy videos demonstrate the effectiveness and efficiency of our Vivim.
arXiv Detail & Related papers (2024-01-25T13:27:03Z) - MOCA: Self-supervised Representation Learning by Predicting Masked Online Codebook Assignments [72.6405488990753]
Self-supervised learning can be used for mitigating the greedy needs of Vision Transformer networks.
We propose a single-stage and standalone method, MOCA, which unifies both desired properties.
We achieve new state-of-the-art results on low-shot settings and strong experimental results in various evaluation protocols.
arXiv Detail & Related papers (2023-07-18T15:46:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.