Related papers: Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation

Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation

URL: http://arxiv.org/abs/2402.07245v2
Date: Fri, 29 Mar 2024 19:47:50 GMT
Title: Semi-Mamba-UNet: Pixel-Level Contrastive and Pixel-Level Cross-Supervised Visual Mamba-based UNet for Semi-Supervised Medical Image Segmentation
Authors: Chao Ma, Ziyang Wang,
Abstract summary: This paper introduces the Semi-Mamba-UNet, which integrates a visual mamba-based UNet architecture with a conventional UNet into a semi-supervised learning (SSL) framework. Our comprehensive evaluation on a publicly available MRI cardiac segmentation dataset highlights the superior performance of Semi-Mamba-UNet.
Score: 11.637738540262797
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Medical image segmentation is essential in diagnostics, treatment planning, and healthcare, with deep learning offering promising advancements. Notably, Convolutional Neural Network (CNN) excel in capturing local image features, whereas Vision Transformer (ViT) adeptly model long-range dependencies through multi-head self-attention mechanisms. Despite their strengths, both CNN and ViT face challenges in efficiently processing long-range dependencies within medical images, often requiring substantial computational resources. This issue, combined with the high cost and limited availability of expert annotations, poses significant obstacles to achieving precise segmentation. To address these challenges, this paper introduces the Semi-Mamba-UNet, which integrates a visual mamba-based UNet architecture with a conventional UNet into a semi-supervised learning (SSL) framework. This innovative SSL approach leverages dual networks to jointly generate pseudo labels and cross supervise each other, drawing inspiration from consistency regularization techniques. Furthermore, we introduce a self-supervised pixel-level contrastive learning strategy, employing a projector pair to further enhance feature learning capabilities. Our comprehensive evaluation on a publicly available MRI cardiac segmentation dataset, comparing against various SSL frameworks with different UNet-based segmentation networks, highlights the superior performance of Semi-Mamba-UNet. The source code has been made publicly accessible.

Related papers

MSLAU-Net: A Hybird CNN-Transformer Network for Medical Image Segmentation [7.826754189244901]
Both CNN-based and Transformer-based methods have achieved remarkable success in medical image segmentation tasks.<n>We propose a novel hybrid CNN-Transformer architecture, named MSLAU-Net, which integrates the strengths of both paradigms.<n>The proposed MSLAU-Net incorporates two key ideas. First, it introduces Multi-Scale Linear Attention, designed to efficiently extract multi-scale features from medical images.<n>Second, it adopts a top-down feature aggregation mechanism, which performs multi-level feature aggregation and restores spatial resolution.
arXiv Detail & Related papers (2025-05-24T18:48:29Z)
DAMamba: Vision State Space Model with Dynamic Adaptive Scan [51.81060691414399]
State space models (SSMs) have recently garnered significant attention in computer vision. We propose Dynamic Adaptive Scan (DAS), a data-driven method that adaptively allocates scanning orders and regions. Based on DAS, we propose the vision backbone DAMamba, which significantly outperforms current state-of-the-art vision Mamba models in vision tasks.
arXiv Detail & Related papers (2025-02-18T08:12:47Z)
Semi-supervised Semantic Segmentation for Remote Sensing Images via Multi-scale Uncertainty Consistency and Cross-Teacher-Student Attention [59.19580789952102]
This paper proposes a novel semi-supervised Multi-Scale Uncertainty and Cross-Teacher-Student Attention (MUCA) model for RS image semantic segmentation tasks. MUCA constrains the consistency among feature maps at different layers of the network by introducing a multi-scale uncertainty consistency regularization. MUCA utilizes a Cross-Teacher-Student attention mechanism to guide the student network, guiding the student network to construct more discriminative feature representations.
arXiv Detail & Related papers (2025-01-18T11:57:20Z)
MSVM-UNet: Multi-Scale Vision Mamba UNet for Medical Image Segmentation [3.64388407705261]
We propose a Multi-Scale Vision Mamba UNet model for medical image segmentation, termed MSVM-UNet. Specifically, by introducing multi-scale convolutions in the VSS blocks, we can more effectively capture and aggregate multi-scale feature representations from the hierarchical features of the VMamba encoder.
arXiv Detail & Related papers (2024-08-25T06:20:28Z)
Self-Prior Guided Mamba-UNet Networks for Medical Image Super-Resolution [7.97504951029884]
We propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution. Inspired by Mamba, our approach aims to learn the self-prior multi-scale contextual features under Mamba-UNet networks.
arXiv Detail & Related papers (2024-07-08T14:41:53Z)
Weak-Mamba-UNet: Visual Mamba Makes CNN and ViT Work Better for Scribble-based Medical Image Segmentation [13.748446415530937]
This paper introduces Weak-Mamba-UNet, an innovative weakly-supervised learning (WSL) framework for medical image segmentation. WSL strategy incorporates three distinct architecture but same symmetrical encoder-decoder networks: a CNN-based UNet for detailed local feature extraction, a Swin Transformer-based SwinUNet for comprehensive global context understanding, and a VMamba-based Mamba-UNet for efficient long-range dependency modeling. The effectiveness of Weak-Mamba-UNet is validated on a publicly available MRI cardiac segmentation dataset with processed annotations, where it surpasses the performance of a similar WSL
arXiv Detail & Related papers (2024-02-16T18:43:39Z)
Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain [48.440691680864745]
We introduce a new neural network architecture, termed LoGoNet, with a tailored self-supervised learning (SSL) method. LoGoNet integrates a novel feature extractor within a U-shaped architecture, leveraging Large Kernel Attention (LKA) and a dual encoding strategy. We propose a novel SSL method tailored for 3D images to compensate for the lack of large labeled datasets.
arXiv Detail & Related papers (2024-02-09T05:06:58Z)
A Dual-branch Self-supervised Representation Learning Framework for Tumour Segmentation in Whole Slide Images [12.961686610789416]
Self-supervised learning (SSL) has emerged as an alternative solution to reduce the annotation overheads in whole slide images. These SSL approaches are not designed for handling multi-resolution WSIs, which limits their performance in learning discriminative image features. We propose a Dual-branch SSL Framework for WSI tumour segmentation (DSF-WSI) that can effectively learn image features from multi-resolution WSIs.
arXiv Detail & Related papers (2023-03-20T10:57:28Z)
PCRLv2: A Unified Visual Information Preservation Framework for Self-supervised Pre-training in Medical Image Analysis [56.63327669853693]
We propose to incorporate the task of pixel restoration for explicitly encoding more pixel-level information into high-level semantics. We also address the preservation of scale information, a powerful tool in aiding image understanding. The proposed unified SSL framework surpasses its self-supervised counterparts on various tasks.
arXiv Detail & Related papers (2023-01-02T17:47:27Z)
When CNN Meet with ViT: Towards Semi-Supervised Learning for Multi-Class Medical Image Semantic Segmentation [13.911947592067678]
In this paper, an advanced consistency-aware pseudo-label-based self-ensembling approach is presented. Our framework consists of a feature-learning module which is enhanced by ViT and CNN mutually, and a guidance module which is robust for consistency-aware purposes. Experimental results show that the proposed method achieves state-of-the-art performance on a public benchmark data set.
arXiv Detail & Related papers (2022-08-12T18:21:22Z)
Deep Image Clustering with Contrastive Learning and Multi-scale Graph Convolutional Networks [58.868899595936476]
This paper presents a new deep clustering approach termed image clustering with contrastive learning and multi-scale graph convolutional networks (IcicleGCN) Experiments on multiple image datasets demonstrate the superior clustering performance of IcicleGCN over the state-of-the-art.
arXiv Detail & Related papers (2022-07-14T19:16:56Z)
Two-Stream Graph Convolutional Network for Intra-oral Scanner Image Segmentation [133.02190910009384]
We propose a two-stream graph convolutional network (i.e., TSGCN) to handle inter-view confusion between different raw attributes. Our TSGCN significantly outperforms state-of-the-art methods in 3D tooth (surface) segmentation.
arXiv Detail & Related papers (2022-04-19T10:41:09Z)
Learning Self-Supervised Low-Rank Network for Single-Stage Weakly and Semi-Supervised Semantic Segmentation [119.009033745244]
This paper presents a Self-supervised Low-Rank Network ( SLRNet) for single-stage weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS) SLRNet uses cross-view self-supervision, that is, it simultaneously predicts several attentive LR representations from different views of an image to learn precise pseudo-labels. Experiments on the Pascal VOC 2012, COCO, and L2ID datasets demonstrate that our SLRNet outperforms both state-of-the-art WSSS and SSSS methods with a variety of different settings.
arXiv Detail & Related papers (2022-03-19T09:19:55Z)
MetricUNet: Synergistic Image- and Voxel-Level Learning for Precise CT Prostate Segmentation via Online Sampling [66.01558025094333]
We propose a two-stage framework, with the first stage to quickly localize the prostate region and the second stage to precisely segment the prostate. We introduce a novel online metric learning module through voxel-wise sampling in the multi-task network. Our method can effectively learn more representative voxel-level features compared with the conventional learning methods with cross-entropy or Dice loss.
arXiv Detail & Related papers (2020-05-15T10:37:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.