Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation
- URL: http://arxiv.org/abs/2402.07245v3
- Date: Sun, 28 Jul 2024 17:11:28 GMT
- Title: Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation
- Authors: Chao Ma, Ziyang Wang,
- Abstract summary: This study introduces Semi-Mamba-UNet, which integrates a purely visual Mamba-based encoder-decoder architecture with a conventional CNN-based UNet into a semi-supervised learning framework.
This innovative SSL approach leverages both networks to generate pseudo-labels and cross-supervise one another at the pixel level simultaneously.
We introduce a self-supervised pixel-level contrastive learning strategy that employs a pair of projectors to enhance the feature learning capabilities further.
- Score: 11.637738540262797
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Medical image segmentation is essential in diagnostics, treatment planning, and healthcare, with deep learning offering promising advancements. Notably, the convolutional neural network (CNN) excels in capturing local image features, whereas the Vision Transformer (ViT) adeptly models long-range dependencies through multi-head self-attention mechanisms. Despite their strengths, both the CNN and ViT face challenges in efficiently processing long-range dependencies in medical images, often requiring substantial computational resources. This issue, combined with the high cost and limited availability of expert annotations, poses significant obstacles to achieving precise segmentation. To address these challenges, this study introduces Semi-Mamba-UNet, which integrates a purely visual Mamba-based U-shaped encoder-decoder architecture with a conventional CNN-based UNet into a semi-supervised learning (SSL) framework. This innovative SSL approach leverages both networks to generate pseudo-labels and cross-supervise one another at the pixel level simultaneously, drawing inspiration from consistency regularisation techniques. Furthermore, we introduce a self-supervised pixel-level contrastive learning strategy that employs a pair of projectors to enhance the feature learning capabilities further, especially on unlabelled data. Semi-Mamba-UNet was comprehensively evaluated on two publicly available segmentation dataset and compared with seven other SSL frameworks with both CNN- or ViT-based UNet as the backbone network, highlighting the superior performance of the proposed method. The source code of Semi-Mamba-Unet, all baseline SSL frameworks, the CNN- and ViT-based networks, and the two corresponding datasets are made publicly accessible.
Related papers
- MSVM-UNet: Multi-Scale Vision Mamba UNet for Medical Image Segmentation [3.64388407705261]
We propose a Multi-Scale Vision Mamba UNet model for medical image segmentation, termed MSVM-UNet.
Specifically, by introducing multi-scale convolutions in the VSS blocks, we can more effectively capture and aggregate multi-scale feature representations from the hierarchical features of the VMamba encoder.
arXiv Detail & Related papers (2024-08-25T06:20:28Z) - Self-Prior Guided Mamba-UNet Networks for Medical Image Super-Resolution [7.97504951029884]
We propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution.
Inspired by Mamba, our approach aims to learn the self-prior multi-scale contextual features under Mamba-UNet networks.
arXiv Detail & Related papers (2024-07-08T14:41:53Z) - Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for
Scribble-based Medical Image Segmentation [13.748446415530937]
This paper introduces Weak-Mamba-UNet, an innovative weakly-supervised learning (WSL) framework for medical image segmentation.
WSL strategy incorporates three distinct architecture but same symmetrical encoder-decoder networks: a CNN-based UNet for detailed local feature extraction, a Swin Transformer-based SwinUNet for comprehensive global context understanding, and a VMamba-based Mamba-UNet for efficient long-range dependency modeling.
The effectiveness of Weak-Mamba-UNet is validated on a publicly available MRI cardiac segmentation dataset with processed annotations, where it surpasses the performance of a similar WSL
arXiv Detail & Related papers (2024-02-16T18:43:39Z) - Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain [48.440691680864745]
We introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method.
LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy.
We propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets.
arXiv Detail & Related papers (2024-02-09T05:06:58Z) - A Dual-branch Self-supervised Representation Learning Framework for
Tumour Segmentation in Whole Slide Images [12.961686610789416]
Self-supervised learning (SSL) has emerged as an alternative solution to reduce the annotation overheads in whole slide images.
These SSL approaches are not designed for handling multi-resolution WSIs, which limits their performance in learning discriminative image features.
We propose a Dual-branch SSL Framework for WSI tumour segmentation (DSF-WSI) that can effectively learn image features from multi-resolution WSIs.
arXiv Detail & Related papers (2023-03-20T10:57:28Z) - PCRLv2: A Unified Visual Information Preservation Framework for
Self-supervised Pre-training in Medical Image Analysis [56.63327669853693]
We propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics.
We also address the preservation of scale information, a powerful tool in aiding image understanding.
The proposed unified SSL framework surpasses its self-supervised counterparts on various tasks.
arXiv Detail & Related papers (2023-01-02T17:47:27Z) - When CNN Meet with ViT: Towards Semi-Supervised Learning for Multi-Class
Medical Image Semantic Segmentation [13.911947592067678]
In this paper, an advanced consistency-aware pseudo-label-based self-ensembling approach is presented.
Our framework consists of a feature-learning module which is enhanced by ViT and CNN mutually, and a guidance module which is robust for consistency-aware purposes.
Experimental results show that the proposed method achieves state-of-the-art performance on a public benchmark data set.
arXiv Detail & Related papers (2022-08-12T18:21:22Z) - Deep Image Clustering with Contrastive Learning and Multi-scale Graph
Convolutional Networks [58.868899595936476]
This paper presents a new deep clustering approach termed image clustering with contrastive learning and multi-scale graph convolutional networks (IcicleGCN)
Experiments on multiple image datasets demonstrate the superior clustering performance of IcicleGCN over the state-of-the-art.
arXiv Detail & Related papers (2022-07-14T19:16:56Z) - Two-Stream Graph Convolutional Network for Intra-oral Scanner Image
Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes.
Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z) - Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and
Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS)
SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels.
Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z) - MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT
Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate.
We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network.
Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.