ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation
- URL: http://arxiv.org/abs/2505.07687v1
- Date: Mon, 12 May 2025 15:51:15 GMT
- Title: ABS-Mamba: SAM2-Driven Bidirectional Spiral Mamba Network for Medical Image Translation
- Authors: Feng Yuan, Yifan Gao, Wenbin Wu, Keqing Wu, Xiaotong Guo, Jie Jiang, Xin Gao,
- Abstract summary: ABS-Mamba is a novel architecture for organ-aware semantic representation.<n>CNNs preserve modality-specific edge and texture details.<n>Mamba's selective state-space modeling for efficient long- and short-range feature dependencies.
- Score: 20.242887183708653
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate multi-modal medical image translation requires ha-rmonizing global anatomical semantics and local structural fidelity, a challenge complicated by intermodality information loss and structural distortion. We propose ABS-Mamba, a novel architecture integrating the Segment Anything Model 2 (SAM2) for organ-aware semantic representation, specialized convolutional neural networks (CNNs) for preserving modality-specific edge and texture details, and Mamba's selective state-space modeling for efficient long- and short-range feature dependencies. Structurally, our dual-resolution framework leverages SAM2's image encoder to capture organ-scale semantics from high-resolution inputs, while a parallel CNNs branch extracts fine-grained local features. The Robust Feature Fusion Network (RFFN) integrates these epresentations, and the Bidirectional Mamba Residual Network (BMRN) models spatial dependencies using spiral scanning and bidirectional state-space dynamics. A three-stage skip fusion decoder enhances edge and texture fidelity. We employ Efficient Low-Rank Adaptation (LoRA+) fine-tuning to enable precise domain specialization while maintaining the foundational capabilities of the pre-trained components. Extensive experimental validation on the SynthRAD2023 and BraTS2019 datasets demonstrates that ABS-Mamba outperforms state-of-the-art methods, delivering high-fidelity cross-modal synthesis that preserves anatomical semantics and structural details to enhance diagnostic accuracy in clinical applications. The code is available at https://github.com/gatina-yone/ABS-Mamba
Related papers
- Multimodal Visual Surrogate Compression for Alzheimer's Disease Classification [69.87877580725768]
Multimodal Visual Surrogate Compression (MVSC) learns to compress and adapt large 3D sMRI volumes into compact 2D features.<n>MVSC has two key components: a Volume Context that captures global cross-slice context under textual guidance, and an Adaptive Slice Fusion module that aggregates slice-level information in a text-enhanced, patch-wise manner.
arXiv Detail & Related papers (2026-01-29T13:05:46Z) - StdGEN++: A Comprehensive System for Semantic-Decomposed 3D Character Generation [57.06461272772509]
StdGEN++ is a novel and comprehensive system for generating high-fidelity, semantically decomposed 3D characters from diverse inputs.<n>It achieves state-of-the-art performance, significantly outperforming existing methods in geometric accuracy and semantic disentanglement.<n>The resulting structural independence unlocks advanced downstream capabilities, including non-destructive editing, physics-compliant animation, and gaze tracking.
arXiv Detail & Related papers (2026-01-12T15:41:27Z) - HyM-UNet: Synergizing Local Texture and Global Context via Hybrid CNN-Mamba Architecture for Medical Image Segmentation [3.976000861085382]
HyM-UNet is designed to synergize the local feature extraction capabilities of CNNs with the efficient global modeling capabilities of Mamba.<n>To bridge the semantic gap between the encoder and the decoder, we propose a Mamba-Guided Fusion Skip Connection.<n>The results demonstrate that HyM-UNet significantly outperforms existing state-of-the-art methods in terms of Dice coefficient and IoU.
arXiv Detail & Related papers (2025-11-22T09:02:06Z) - VesSAM: Efficient Multi-Prompting for Segmenting Complex Vessel [68.24765319399286]
We present VesSAM, a powerful and efficient framework tailored for 2D vessel segmentation.<n>VesSAM integrates (1) a convolutional adapter to enhance local texture features, (2) a multi-prompt encoder that fuses anatomical prompts, and (3) a lightweight mask decoder to reduce jagged artifacts.<n>VesSAM consistently outperforms state-of-the-art PEFT-based SAM variants by over 10% Dice and 13% IoU.
arXiv Detail & Related papers (2025-11-02T15:47:05Z) - Exploring Non-Local Spatial-Angular Correlations with a Hybrid Mamba-Transformer Framework for Light Field Super-Resolution [68.54692184478462]
Mamba-based methods have shown great potential in optimizing both computational cost and performance of light field image super-resolution.<n>We propose a Subspace Simple Scanning (Sub-SS) strategy, based on which we design the Subspace Simple Mamba Block (SSMB) to achieve more efficient and precise feature extraction.<n>We also propose a dual-stage modeling strategy to address the limitation of state space in preserving spatial-angular and disparity information.
arXiv Detail & Related papers (2025-09-05T05:50:38Z) - CSFMamba: Cross State Fusion Mamba Operator for Multimodal Remote Sensing Image Classification [12.959829835589453]
We propose Cross State Fusion Mamba (Camba) Network.<n>Specifically, we first design the preprocessing module of remote sensing image information for the needs of Mamba structure.<n> Secondly, a cross-state module based on Mamba operator is creatively designed to fully fuse the feature of the two modalities.
arXiv Detail & Related papers (2025-08-31T03:08:34Z) - DM-SegNet: Dual-Mamba Architecture for 3D Medical Image Segmentation with Global Context Modeling [0.0]
We present DM-SegNet, a Dual-Mamba architecture integrating directional state transitions with anatomy-aware hierarchical decoding.<n>The core innovations include a quadri-directional spatial Mamba module employing four-directional 3D scanning to maintain anatomical spatial coherence.<n>Extensive evaluation on two clinically significant benchmarks demonstrates the efficacy of DM-SegNet.
arXiv Detail & Related papers (2025-06-05T17:49:46Z) - SAMba-UNet: Synergizing SAM2 and Mamba in UNet with Heterogeneous Aggregation for Cardiac MRI Segmentation [6.451534509235736]
This study proposes an innovative dual-encoder architecture named SAMba-UNet.<n>The framework achieves cross-modal feature collaborative learning by integrating the vision foundation model SAM2, the state-space model Mamba, and the classical UNet.<n> Experiments on the ACDC cardiac MRI dataset demonstrate that the proposed model achieves a Dice coefficient of 0.9103 and an HD95 boundary error of 1.0859 mm.
arXiv Detail & Related papers (2025-05-22T06:57:03Z) - RD-UIE: Relation-Driven State Space Modeling for Underwater Image Enhancement [59.364418120895]
Underwater image enhancement (UIE) is a critical preprocessing step for marine vision applications.<n>We develop a novel relation-driven Mamba framework for effective UIE (RD-UIE)<n>Experiments on underwater enhancement benchmarks demonstrate RD-UIE outperforms the state-of-the-art approach WMamba.
arXiv Detail & Related papers (2025-05-02T12:21:44Z) - DAMamba: Vision State Space Model with Dynamic Adaptive Scan [51.81060691414399]
State space models (SSMs) have recently garnered significant attention in computer vision.<n>We propose Dynamic Adaptive Scan (DAS), a data-driven method that adaptively allocates scanning orders and regions.<n>Based on DAS, we propose the vision backbone DAMamba, which significantly outperforms current state-of-the-art vision Mamba models in vision tasks.
arXiv Detail & Related papers (2025-02-18T08:12:47Z) - Hierarchical Spatio-Temporal State-Space Modeling for fMRI Analysis [1.89314691075208]
Experimental results demonstrate significant improvements in the proposed FST-Mamba model on various brain-based classification and regression tasks.<n>Our work reveals the substantial potential of attention-free sequence modeling in brain discovery.
arXiv Detail & Related papers (2024-08-23T13:58:14Z) - Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI [58.809276442508256]
We propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers.
The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network superior performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-11T15:46:00Z) - I2I-Mamba: Multi-modal medical image synthesis via selective state space modeling [8.48392350084504]
We propose a novel adversarial model for medical image synthesis, I2I-Mamba, to efficiently capture long-range context.
I2I-Mamba offers superior performance against state-of-the-art CNN- and transformer-based methods in synthesizing target-modality images.
arXiv Detail & Related papers (2024-05-22T21:55:58Z) - VM-DDPM: Vision Mamba Diffusion for Medical Image Synthesis [0.8111815974227898]
We propose the Vision Mamba DDPM (VM-DDPM) based on State Space Model (SSM)
To our best knowledge, this is the first medical image synthesis model based on the SSM-CNN hybrid architecture.
Our experimental evaluation on three datasets of different scales, i.e., ACDC, BraTS2018, and ChestXRay, demonstrate that VM-DDPM achieves state-of-the-art performance.
arXiv Detail & Related papers (2024-05-09T10:41:18Z) - SDR-Former: A Siamese Dual-Resolution Transformer for Liver Lesion
Classification Using 3D Multi-Phase Imaging [59.78761085714715]
This study proposes a novel Siamese Dual-Resolution Transformer (SDR-Former) framework for liver lesion classification.
The proposed framework has been validated through comprehensive experiments on two clinical datasets.
To support the scientific community, we are releasing our extensive multi-phase MR dataset for liver lesion analysis to the public.
arXiv Detail & Related papers (2024-02-27T06:32:56Z) - Multi-scale Transformer Network with Edge-aware Pre-training for
Cross-Modality MR Image Synthesis [52.41439725865149]
Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones.
Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective synthesis model.
We propose a Multi-scale Transformer Network (MT-Net) with edge-aware pre-training for cross-modality MR image synthesis.
arXiv Detail & Related papers (2022-12-02T11:40:40Z) - Spatial Dependency Networks: Neural Layers for Improved Generative Image
Modeling [79.15521784128102]
We introduce a novel neural network for building image generators (decoders) and apply it to variational autoencoders (VAEs)
In our spatial dependency networks (SDNs), feature maps at each level of a deep neural net are computed in a spatially coherent way.
We show that augmenting the decoder of a hierarchical VAE by spatial dependency layers considerably improves density estimation.
arXiv Detail & Related papers (2021-03-16T07:01:08Z) - Adaptive Linear Span Network for Object Skeleton Detection [56.78705071830965]
We propose adaptive linear span network (AdaLSN) to automatically configure and integrate scale-aware features for object skeleton detection.
AdaLSN substantiates its versatility by achieving significantly higher accuracy and latency trade-off.
It also demonstrates general applicability to image-to-mask tasks such as edge detection and road extraction.
arXiv Detail & Related papers (2020-11-08T12:51:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.