Related papers: SAMba-UNet: SAM2-Mamba UNet for Cardiac MRI in Medical Robotic Perception

SAMba-UNet: SAM2-Mamba UNet for Cardiac MRI in Medical Robotic Perception

URL: http://arxiv.org/abs/2505.16304v2
Date: Tue, 09 Sep 2025 09:33:06 GMT
Title: SAMba-UNet: SAM2-Mamba UNet for Cardiac MRI in Medical Robotic Perception
Authors: Guohao Huo, Ruiting Dai, Ling Shao, Hao Tang,
Abstract summary: We propose a novel dual-encoder architecture, SAMba-UNet, to address pathological feature extraction in automated cardiac MRI segmentation.<n> SAMba-UNet attains a Dice of 0.9103 and HD95 of 1.0859 mm, notably improving boundary localization for challenging structures like the right ventricle.<n>Its robust, high-fidelity segmentation maps are directly applicable as a perception module within intelligent medical and surgical robotic systems.
Score: 34.79269228659671
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To address complex pathological feature extraction in automated cardiac MRI segmentation, we propose SAMba-UNet, a novel dual-encoder architecture that synergistically combines the vision foundation model SAM2, the linear-complexity state-space model Mamba, and the classical UNet to achieve cross-modal collaborative feature learning; to overcome domain shifts between natural images and medical scans, we introduce a Dynamic Feature Fusion Refiner that employs multi-scale pooling and channel-spatial dual-path calibration to strengthen small-lesion and fine-structure representation, and we design a Heterogeneous Omni-Attention Convergence Module (HOACM) that fuses SAM2's local positional semantics with Mamba's long-range dependency modeling via global contextual attention and branch-selective emphasis, yielding substantial gains in both global consistency and boundary precision-on the ACDC cardiac MRI benchmark, SAMba-UNet attains a Dice of 0.9103 and HD95 of 1.0859 mm, notably improving boundary localization for challenging structures like the right ventricle, and its robust, high-fidelity segmentation maps are directly applicable as a perception module within intelligent medical and surgical robotic systems to support preoperative planning, intraoperative navigation, and postoperative complication screening; the code will be open-sourced to facilitate clinical translation and further validation.

Related papers

Toward AI Autonomous Navigation for Mechanical Thrombectomy using Hierarchical Modular Multi-agent Reinforcement Learning (HM-MARL) [57.65363326406228]
We propose a Hierarchical Modular Multi-Agent Reinforcement Learning framework for autonomous two-device navigation in vitro.<n>HM-MARL was developed to autonomously navigate a guide catheter and guidewire from the femoral artery to the internal carotid artery (ICA)<n>A modular multi-agent approach was used to decompose the complex navigation task into specialized subtasks, each trained using Soft Actor-Critic RL.<n>In vitro, both HM-MARL models successfully navigated 100% of trials from the femoral artery to the right common carotid artery and 80% to the right ICA but failed on the left-side vessel challenge
arXiv Detail & Related papers (2026-02-20T23:50:35Z)
MedSAM-Agent: Empowering Interactive Medical Image Segmentation with Multi-turn Agentic Reinforcement Learning [53.37068897861388]
MedSAM-Agent is a framework that reformulates interactive segmentation as a multi-step autonomous decision-making process.<n>We develop a two-stage training pipeline that integrates multi-turn, end-to-end outcome verification.<n>Experiments across 6 medical modalities and 21 datasets demonstrate that MedSAM-Agent achieves state-of-the-art performance.
arXiv Detail & Related papers (2026-02-03T09:47:49Z)
A Hybrid Mamba-SAM Architecture for Efficient 3D Medical Image Segmentation [0.4358626952482685]
Mamba-SAM is a novel and efficient hybrid architecture that combines a frozen SAM encoder with the linear-time efficiency and long-range modeling capabilities of Mamba-based State Space Models (SSMs)<n>We introduce Multi-Frequency Gated Convolution (MFGC), which enhances feature representation by jointly analyzing spatial and frequency-domain information via 3D discrete cosine transforms and adaptive gating.<n>The dual-branch Mamba-SAM-Base model achieves a mean Dice score of 0.906, comparable to UNet++ (0.907), while outperforming all baselines on Myocardium (0.910) and Left Ventric
arXiv Detail & Related papers (2026-01-31T10:51:17Z)
VesSAM: Efficient Multi-Prompting for Segmenting Complex Vessel [68.24765319399286]
We present VesSAM, a powerful and efficient framework tailored for 2D vessel segmentation.<n>VesSAM integrates (1) a convolutional adapter to enhance local texture features, (2) a multi-prompt encoder that fuses anatomical prompts, and (3) a lightweight mask decoder to reduce jagged artifacts.<n>VesSAM consistently outperforms state-of-the-art PEFT-based SAM variants by over 10% Dice and 13% IoU.
arXiv Detail & Related papers (2025-11-02T15:47:05Z)
HybridMamba: A Dual-domain Mamba for 3D Medical Image Segmentation [12.595264673714025]
Mamba exhibits the superior performance for it addresses the limitations in modeling long-range dependencies inherent to CNNs.<n>We propose the HybridMamba, an architecture employing dual complementary mechanisms.<n> Experiments on MRI and CT datasets demonstrate that HybridMamba significantly outperforms the state-of-the-art methods in 3D medical image segmentation.
arXiv Detail & Related papers (2025-09-18T04:32:49Z)
FaRMamba: Frequency-based learning and Reconstruction aided Mamba for Medical Segmentation [3.5790602918760586]
Vision Mamba employs one-dimensional causal state-space recurrence to efficiently model global dependencies.<n>Its patch tokenization and 1D serialization disrupt local pixel adjacency and impose a low-pass filtering effect.<n>We propose FaRMamba, a novel extension that explicitly addresses LHICD and 2D-SSD through two complementary modules.
arXiv Detail & Related papers (2025-07-26T20:41:53Z)
MARL-MambaContour: Unleashing Multi-Agent Deep Reinforcement Learning for Active Contour Optimization in Medical Image Segmentation [5.389510984268956]
We introduce MARL-MambaContour, the first contour-based medical image segmentation framework based on Multi-Agent Reinforcement Learning (MARL)<n>Our approach reframes segmentation as a multi-agent cooperation task focused on generate topologically consistent object-level contours.<n>Experiments on five diverse medical imaging datasets demonstrate the state-of-the-art performance of MARL-MambaContour.
arXiv Detail & Related papers (2025-06-23T14:22:49Z)
ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation [20.242887183708653]
ABS-Mamba is a novel architecture for organ-aware semantic representation.<n>CNNs preserve modality-specific edge and texture details.<n>Mamba's selective state-space modeling for efficient long- and short-range feature dependencies.
arXiv Detail & Related papers (2025-05-12T15:51:15Z)
MSV-Mamba: A Multiscale Vision Mamba Network for Echocardiography Segmentation [8.090155401012169]
Mamba, an emerging model, is one of the most cutting-edge approaches that is widely applied to diverse vision and language tasks.<n>This paper introduces a U-shaped deep learning model incorporating a large-window multiscale mamba module and a hierarchical feature fusion approach for echocardiographic segmentation.
arXiv Detail & Related papers (2025-01-13T08:22:10Z)
HCMA-UNet: A Hybrid CNN-Mamba UNet with Axial Self-Attention for Efficient Breast Cancer Segmentation [7.807738181550226]
This study proposes a novel hybrid segmentation network, HCMA-UNet, for lesion segmentation of breast cancer.<n>Our network consists of a lightweight CNN backbone and a Multi-view Axial Self-Attention Mamba (MISM) module.<n>Our lightweight model achieves superior performance with 2.87M parameters and 126.44 GFLOPs.
arXiv Detail & Related papers (2025-01-01T06:42:57Z)
XLSTM-HVED: Cross-Modal Brain Tumor Segmentation and MRI Reconstruction Method Using Vision XLSTM and Heteromodal Variational Encoder-Decoder [9.141615533517719]
We introduce the XLSTM-HVED model, which integrates a heteromodal encoder-decoder framework with the Vision XLSTM module to reconstruct missing MRI modalities.<n>Key innovation of our approach is the Self-Attention Variational (SAVE) module, which improves the integration of modal features.<n>Our experiments using the BraTS 2024 dataset demonstrate that our model significantly outperforms existing advanced methods in handling cases where modalities are missing.
arXiv Detail & Related papers (2024-12-09T09:04:02Z)
SAM-Swin: SAM-Driven Dual-Swin Transformers with Adaptive Lesion Enhancement for Laryngo-Pharyngeal Tumor Detection [12.86763797167925]
Laryngo-pharyngeal cancer (LPC) is a highly lethal malignancy in the head and neck region. Recent advancements in tumor detection have significantly improved diagnostic accuracy by integrating global and local feature extraction. We propose SAM-Swin, an innovative SAM-driven Dual-Swin Transformer for laryngo-pharyngeal tumor detection.
arXiv Detail & Related papers (2024-10-29T07:32:57Z)
Accelerated Multi-Contrast MRI Reconstruction via Frequency and Spatial Mutual Learning [50.74383395813782]
We propose a novel Frequency and Spatial Mutual Learning Network (FSMNet) to explore global dependencies across different modalities. The proposed FSMNet achieves state-of-the-art performance for the Multi-Contrast MR Reconstruction task with different acceleration factors.
arXiv Detail & Related papers (2024-09-21T12:02:47Z)
Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI [58.809276442508256]
We propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers. The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network superior performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-11T15:46:00Z)
DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data. It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z)
Interpretable Spatio-Temporal Embedding for Brain Structural-Effective Network with Ordinary Differential Equation [56.34634121544929]
In this study, we first construct the brain-effective network via the dynamic causal model. We then introduce an interpretable graph learning framework termed Spatio-Temporal Embedding ODE (STE-ODE) This framework incorporates specifically designed directed node embedding layers, aiming at capturing the dynamic interplay between structural and effective networks.
arXiv Detail & Related papers (2024-05-21T20:37:07Z)
Joint Multimodal Transformer for Emotion Recognition in the Wild [49.735299182004404]
Multimodal emotion recognition (MMER) systems typically outperform unimodal systems. This paper proposes an MMER method that relies on a joint multimodal transformer (JMT) for fusion with key-based cross-attention.
arXiv Detail & Related papers (2024-03-15T17:23:38Z)
Cross-modality Guidance-aided Multi-modal Learning with Dual Attention for MRI Brain Tumor Grading [47.50733518140625]
Brain tumor represents one of the most fatal cancers around the world, and is very common in children and the elderly. We propose a novel cross-modality guidance-aided multi-modal learning with dual attention for addressing the task of MRI brain tumor grading.
arXiv Detail & Related papers (2024-01-17T07:54:49Z)

This list is automatically generated from the titles and abstracts of the papers in this site.