Prompt-Guided Dual-Path UNet with Mamba for Medical Image Segmentation
- URL: http://arxiv.org/abs/2503.19589v1
- Date: Tue, 25 Mar 2025 12:12:07 GMT
- Title: Prompt-Guided Dual-Path UNet with Mamba for Medical Image Segmentation
- Authors: Shaolei Zhang, Jinyan Liu, Tianyi Qian, Xuesong Li,
- Abstract summary: We propose a prompt-guided CNN-Mamba dual-path UNet, termed PGM-UNet, for medical image segmentation.<n>We introduce a prompt-guided residual Mamba module that adaptively extracts dynamic visual prompts from the original input data.<n>We also design a local-global information fusion network, comprising a local information extraction module, a prompt-guided residual Mamba module, and a multi-focus attention fusion module.
- Score: 18.060052357308763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks (CNNs) and transformers are widely employed in constructing UNet architectures for medical image segmentation tasks. However, CNNs struggle to model long-range dependencies, while transformers suffer from quadratic computational complexity. Recently, Mamba, a type of State Space Models, has gained attention for its exceptional ability to model long-range interactions while maintaining linear computational complexity. Despite the emergence of several Mamba-based methods, they still present the following limitations: first, their network designs generally lack perceptual capabilities for the original input data; second, they primarily focus on capturing global information, while often neglecting local details. To address these challenges, we propose a prompt-guided CNN-Mamba dual-path UNet, termed PGM-UNet, for medical image segmentation. Specifically, we introduce a prompt-guided residual Mamba module that adaptively extracts dynamic visual prompts from the original input data, effectively guiding Mamba in capturing global information. Additionally, we design a local-global information fusion network, comprising a local information extraction module, a prompt-guided residual Mamba module, and a multi-focus attention fusion module, which effectively integrates local and global information. Furthermore, inspired by Kolmogorov-Arnold Networks (KANs), we develop a multi-scale information extraction module to capture richer contextual information without altering the resolution. We conduct extensive experiments on the ISIC-2017, ISIC-2018, DIAS, and DRIVE. The results demonstrate that the proposed method significantly outperforms state-of-the-art approaches in multiple medical image segmentation tasks.
Related papers
- Global and Local Mamba Network for Multi-Modality Medical Image Super-Resolution [6.646855815619148]
We propose a global and local Mamba network (GLMamba) for multi-modality medical image super-resolution.
The global Mamba branch captures long-range relationships in low-resolution inputs, and the local Mamba branch focuses more on short-range details in high-resolution reference images.
A modulator is designed to further enhance deformable features in both global and local Mamba blocks.
arXiv Detail & Related papers (2025-04-14T11:14:24Z) - TransMamba: Fast Universal Architecture Adaption from Transformers to Mamba [88.31117598044725]
We explore cross-architecture training to transfer the ready knowledge in existing Transformer models to alternative architecture Mamba, termed TransMamba.<n>Our approach employs a two-stage strategy to expedite training new Mamba models, ensuring effectiveness in across uni-modal and cross-modal tasks.<n>For cross-modal learning, we propose a cross-Mamba module that integrates language awareness into Mamba's visual features, enhancing the cross-modal interaction capabilities of Mamba architecture.
arXiv Detail & Related papers (2025-02-21T01:22:01Z) - MobileMamba: Lightweight Multi-Receptive Visual Mamba Network [51.33486891724516]
Previous research on lightweight models has primarily focused on CNNs and Transformer-based designs.
We propose the MobileMamba framework, which balances efficiency and performance.
MobileMamba achieves up to 83.6% on Top-1, surpassing existing state-of-the-art methods.
arXiv Detail & Related papers (2024-11-24T18:01:05Z) - INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model [71.50973774576431]
We propose a novel MLLM, INF-LLaVA, designed for effective high-resolution image perception.
We introduce a Dual-perspective Cropping Module (DCM), which ensures that each sub-image contains continuous details from a local perspective.
Second, we introduce Dual-perspective Enhancement Module (DEM) to enable the mutual enhancement of global and local features.
arXiv Detail & Related papers (2024-07-23T06:02:30Z) - Self-Prior Guided Mamba-UNet Networks for Medical Image Super-Resolution [7.97504951029884]
We propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution.
Inspired by Mamba, our approach aims to learn the self-prior multi-scale contextual features under Mamba-UNet networks.
arXiv Detail & Related papers (2024-07-08T14:41:53Z) - LKM-UNet: Large Kernel Vision Mamba UNet for Medical Image Segmentation [9.862277278217045]
In this paper, we introduce a Large Kernel Vision Mamba U-shape Network, or LKM-UNet, for medical image segmentation.
A distinguishing feature of our LKM-UNet is its utilization of large Mamba kernels, excelling in locally spatial modeling compared to small kernel-based CNNs and Transformers.
Comprehensive experiments demonstrate the feasibility and the effectiveness of using large-size Mamba kernels to achieve large receptive fields.
arXiv Detail & Related papers (2024-03-12T05:34:51Z) - Mamba-UNet: UNet-Like Pure Visual Mamba for Medical Image Segmentation [21.1787366866505]
We propose Mamba-UNet, a novel architecture that synergizes the U-Net in medical image segmentation with Mamba's capability.
Mamba-UNet adopts a pure Visual Mamba (VMamba)-based encoder-decoder structure, infused with skip connections to preserve spatial information across different scales of the network.
arXiv Detail & Related papers (2024-02-07T18:33:04Z) - Swin-UMamba: Mamba-based UNet with ImageNet-based pretraining [85.08169822181685]
This paper introduces a novel Mamba-based model, Swin-UMamba, designed specifically for medical image segmentation tasks.
Swin-UMamba demonstrates superior performance with a large margin compared to CNNs, ViTs, and latest Mamba-based models.
arXiv Detail & Related papers (2024-02-05T18:58:11Z) - U-Mamba: Enhancing Long-range Dependency for Biomedical Image
Segmentation [10.083902382768406]
We introduce U-Mamba, a general-purpose network for biomedical image segmentation.
Inspired by the State Space Sequence Models (SSMs), a new family of deep sequence models, we design a hybrid CNN-SSM block.
We conduct experiments on four diverse tasks, including the 3D abdominal organ segmentation in CT and MR images, instrument segmentation in endoscopy images, and cell segmentation in microscopy images.
arXiv Detail & Related papers (2024-01-09T18:53:20Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Boosting Few-shot Semantic Segmentation with Transformers [81.43459055197435]
TRansformer-based Few-shot Semantic segmentation method (TRFS)
Our model consists of two modules: Global Enhancement Module (GEM) and Local Enhancement Module (LEM)
arXiv Detail & Related papers (2021-08-04T20:09:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.