MambaCAFU: Hybrid Multi-Scale and Multi-Attention Model with Mamba-Based Fusion for Medical Image Segmentation
- URL: http://arxiv.org/abs/2510.03786v2
- Date: Tue, 28 Oct 2025 23:43:17 GMT
- Title: MambaCAFU: Hybrid Multi-Scale and Multi-Attention Model with Mamba-Based Fusion for Medical Image Segmentation
- Authors: T-Mai Bui, Fares Bougourzi, Fadi Dornaika, Vinh Truong Hoang,
- Abstract summary: We propose a hybrid segmentation architecture featuring a three-branch encoder that integrates CNNs, Transformers, and a Mamba-based Attention Fusion mechanism.<n>A multi-scale attention-based CNN decoder reconstructs fine-grained segmentation maps while preserving contextual consistency.<n>Our approach outperforms state-of-the-art methods in accuracy and generalization, while maintaining comparable computational complexity.
- Score: 11.967890140626716
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In recent years, deep learning has shown near-expert performance in segmenting complex medical tissues and tumors. However, existing models are often task-specific, with performance varying across modalities and anatomical regions. Balancing model complexity and performance remains challenging, particularly in clinical settings where both accuracy and efficiency are critical. To address these issues, we propose a hybrid segmentation architecture featuring a three-branch encoder that integrates CNNs, Transformers, and a Mamba-based Attention Fusion (MAF) mechanism to capture local, global, and long-range dependencies. A multi-scale attention-based CNN decoder reconstructs fine-grained segmentation maps while preserving contextual consistency. Additionally, a co-attention gate enhances feature selection by emphasizing relevant spatial and semantic information across scales during both encoding and decoding, improving feature interaction and cross-scale communication. Extensive experiments on multiple benchmark datasets show that our approach outperforms state-of-the-art methods in accuracy and generalization, while maintaining comparable computational complexity. By effectively balancing efficiency and effectiveness, our architecture offers a practical and scalable solution for diverse medical imaging tasks. Source code and trained models will be publicly released upon acceptance to support reproducibility and further research.
Related papers
- HyM-UNet: Synergizing Local Texture and Global Context via Hybrid CNN-Mamba Architecture for Medical Image Segmentation [3.976000861085382]
HyM-UNet is designed to synergize the local feature extraction capabilities of CNNs with the efficient global modeling capabilities of Mamba.<n>To bridge the semantic gap between the encoder and the decoder, we propose a Mamba-Guided Fusion Skip Connection.<n>The results demonstrate that HyM-UNet significantly outperforms existing state-of-the-art methods in terms of Dice coefficient and IoU.
arXiv Detail & Related papers (2025-11-22T09:02:06Z) - SpectMamba: Integrating Frequency and State Space Models for Enhanced Medical Image Detection [11.43227481199105]
We present SpectMamba, the first Mamba-based architecture designed for medical image detection.<n>A key component of SpectMamba is the Hybrid Spatial-Frequency Attention (HSFA) block, which separately learns high- and low-frequency features.<n>We show that SpectMamba achieves state-of-the-art performance while being both effective and efficient across various medical image detection tasks.
arXiv Detail & Related papers (2025-09-01T02:56:45Z) - MS-UMamba: An Improved Vision Mamba Unet for Fetal Abdominal Medical Image Segmentation [1.2721397985664153]
We propose MS-UMamba, a novel hybrid convolutional-mamba model for fetal ultrasound image segmentation.<n>Specifically, we design a visual state space block integrated with a CNN branch, which leverages Mamba's global modeling strengths.<n>We also propose an efficient multi-scale feature fusion module, which integrates feature information from different layers.
arXiv Detail & Related papers (2025-06-14T10:34:10Z) - InceptionMamba: Efficient Multi-Stage Feature Enhancement with Selective State Space Model for Microscopic Medical Image Segmentation [15.666926528144202]
We propose an efficient framework for the segmentation task, named InceptionMamba, which encodes multi-stage rich features.<n>We exploit semantic cues to capture both low-frequency and high-frequency regions to enrich the multi-stage features.<n>Our model achieves state-of-the-art performance on two challenging microscopic segmentation datasets.
arXiv Detail & Related papers (2025-06-13T20:25:12Z) - An Efficient and Mixed Heterogeneous Model for Image Restoration [71.85124734060665]
Current mainstream approaches are based on three architectural paradigms: CNNs, Transformers, and Mambas.<n>We propose RestorMixer, an efficient and general-purpose IR model based on mixed-architecture fusion.
arXiv Detail & Related papers (2025-04-15T08:19:12Z) - Rethinking the Nested U-Net Approach: Enhancing Biomarker Segmentation with Attention Mechanisms and Multiscale Feature Fusion [2.0799865428691393]
We introduce a nested UNet architecture that captures both local and global context through Multiscale Feature Fusion and Attention Mechanisms.<n>This design improves feature integration from encoders, highlights key channels and regions, and restores spatial details to enhance segmentation performance.
arXiv Detail & Related papers (2025-04-08T15:53:46Z) - ContextFormer: Redefining Efficiency in Semantic Segmentation [48.81126061219231]
Convolutional methods, although capturing local dependencies well, struggle with long-range relationships.<n>Vision Transformers (ViTs) excel in global context capture but are hindered by high computational demands.<n>We propose ContextFormer, a hybrid framework leveraging the strengths of CNNs and ViTs in the bottleneck to balance efficiency, accuracy, and robustness for real-time semantic segmentation.
arXiv Detail & Related papers (2025-01-31T16:11:04Z) - MambaClinix: Hierarchical Gated Convolution and Mamba-Based U-Net for Enhanced 3D Medical Image Segmentation [6.673169053236727]
We propose MambaClinix, a novel U-shaped architecture for medical image segmentation.
MambaClinix integrates a hierarchical gated convolutional network with Mamba in an adaptive stage-wise framework.
Our results show that MambaClinix achieves high segmentation accuracy while maintaining low model complexity.
arXiv Detail & Related papers (2024-09-19T07:51:14Z) - MoME: Mixture of Multimodal Experts for Cancer Survival Prediction [46.520971457396726]
Survival analysis, as a challenging task, requires integrating Whole Slide Images (WSIs) and genomic data for comprehensive decision-making.
Previous approaches utilize co-attention methods, which fuse features from both modalities only once after separate encoding.
We propose a Biased Progressive Clever (BPE) paradigm, performing encoding and fusion simultaneously.
arXiv Detail & Related papers (2024-06-14T03:44:33Z) - Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search [51.89707241449435]
In this paper, we address the challenge of integrating multi-head self-attention into high-resolution representation CNNs efficiently.<n>We develop a multi-target multi-branch supernet method, which fully utilizes the advantages of high-resolution features.<n>We present a series of models via the Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searches for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers.
arXiv Detail & Related papers (2024-03-15T15:47:54Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Few-shot Medical Image Segmentation using a Global Correlation Network
with Discriminative Embedding [60.89561661441736]
We propose a novel method for few-shot medical image segmentation.
We construct our few-shot image segmentor using a deep convolutional network trained episodically.
We enhance discriminability of deep embedding to encourage clustering of the feature domains of the same class.
arXiv Detail & Related papers (2020-12-10T04:01:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.