SAMba-UNet: Synergizing SAM2 and Mamba in UNet with Heterogeneous Aggregation for Cardiac MRI Segmentation
- URL: http://arxiv.org/abs/2505.16304v1
- Date: Thu, 22 May 2025 06:57:03 GMT
- Title: SAMba-UNet: Synergizing SAM2 and Mamba in UNet with Heterogeneous Aggregation for Cardiac MRI Segmentation
- Authors: Guohao Huo, Ruiting Dai, Hao Tang,
- Abstract summary: This study proposes an innovative dual-encoder architecture named SAMba-UNet.<n>The framework achieves cross-modal feature collaborative learning by integrating the vision foundation model SAM2, the state-space model Mamba, and the classical UNet.<n> Experiments on the ACDC cardiac MRI dataset demonstrate that the proposed model achieves a Dice coefficient of 0.9103 and an HD95 boundary error of 1.0859 mm.
- Score: 6.451534509235736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To address the challenge of complex pathological feature extraction in automated cardiac MRI segmentation, this study proposes an innovative dual-encoder architecture named SAMba-UNet. The framework achieves cross-modal feature collaborative learning by integrating the vision foundation model SAM2, the state-space model Mamba, and the classical UNet. To mitigate domain discrepancies between medical and natural images, a Dynamic Feature Fusion Refiner is designed, which enhances small lesion feature extraction through multi-scale pooling and a dual-path calibration mechanism across channel and spatial dimensions. Furthermore, a Heterogeneous Omni-Attention Convergence Module (HOACM) is introduced, combining global contextual attention with branch-selective emphasis mechanisms to effectively fuse SAM2's local positional semantics and Mamba's long-range dependency modeling capabilities. Experiments on the ACDC cardiac MRI dataset demonstrate that the proposed model achieves a Dice coefficient of 0.9103 and an HD95 boundary error of 1.0859 mm, significantly outperforming existing methods, particularly in boundary localization for complex pathological structures such as right ventricular anomalies. This work provides an efficient and reliable solution for automated cardiac disease diagnosis, and the code will be open-sourced.
Related papers
- FaRMamba: Frequency-based learning and Reconstruction aided Mamba for Medical Segmentation [3.5790602918760586]
Vision Mamba employs one-dimensional causal state-space recurrence to efficiently model global dependencies.<n>Its patch tokenization and 1D serialization disrupt local pixel adjacency and impose a low-pass filtering effect.<n>We propose FaRMamba, a novel extension that explicitly addresses LHICD and 2D-SSD through two complementary modules.
arXiv Detail & Related papers (2025-07-26T20:41:53Z) - ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation [20.242887183708653]
ABS-Mamba is a novel architecture for organ-aware semantic representation.<n>CNNs preserve modality-specific edge and texture details.<n>Mamba's selective state-space modeling for efficient long- and short-range feature dependencies.
arXiv Detail & Related papers (2025-05-12T15:51:15Z) - MSV-Mamba: A Multiscale Vision Mamba Network for Echocardiography Segmentation [8.090155401012169]
Mamba, an emerging model, is one of the most cutting-edge approaches that is widely applied to diverse vision and language tasks.<n>This paper introduces a U-shaped deep learning model incorporating a large-window multiscale mamba module and a hierarchical feature fusion approach for echocardiographic segmentation.
arXiv Detail & Related papers (2025-01-13T08:22:10Z) - HCMA-UNet: A Hybrid CNN-Mamba UNet with Axial Self-Attention for Efficient Breast Cancer Segmentation [7.807738181550226]
This study proposes a novel hybrid segmentation network, HCMA-UNet, for lesion segmentation of breast cancer.<n>Our network consists of a lightweight CNN backbone and a Multi-view Axial Self-Attention Mamba (MISM) module.<n>Our lightweight model achieves superior performance with 2.87M parameters and 126.44 GFLOPs.
arXiv Detail & Related papers (2025-01-01T06:42:57Z) - XLSTM-HVED: Cross-Modal Brain Tumor Segmentation and MRI Reconstruction Method Using Vision XLSTM and Heteromodal Variational Encoder-Decoder [9.141615533517719]
We introduce the XLSTM-HVED model, which integrates a heteromodal encoder-decoder framework with the Vision XLSTM module to reconstruct missing MRI modalities.<n>Key innovation of our approach is the Self-Attention Variational (SAVE) module, which improves the integration of modal features.<n>Our experiments using the BraTS 2024 dataset demonstrate that our model significantly outperforms existing advanced methods in handling cases where modalities are missing.
arXiv Detail & Related papers (2024-12-09T09:04:02Z) - Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities.
The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z) - Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI [58.809276442508256]
We propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers.
The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network superior performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-11T15:46:00Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation [56.34634121544929]
In this study, we first construct the brain-effective network via the dynamic causal model.
We then introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE)
This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks.
arXiv Detail & Related papers (2024-05-21T20:37:07Z) - Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems.
This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z) - Cross-modality Guidance-aided Multi-modal Learning with Dual Attention
for MRI Brain Tumor Grading [47.50733518140625]
Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly.
We propose a novel cross-modality guidance-aided multi-modal learning with dual attention for addressing the task of MRI brain tumor grading.
arXiv Detail & Related papers (2024-01-17T07:54:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.