SAMA-UNet: Enhancing Medical Image Segmentation with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning
- URL: http://arxiv.org/abs/2505.15234v1
- Date: Wed, 21 May 2025 08:12:31 GMT
- Title: SAMA-UNet: Enhancing Medical Image Segmentation with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning
- Authors: Saqib Qamar, Mohd Fazil, Parvez Ahmad, Ghulam Muhammad,
- Abstract summary: We introduce SAMA-UNet, a novel architecture for medical image segmentation.<n>A key innovation is the Self-Adaptive Mamba-like Aggregated Attention (SAMA) block, which integrates contextual self-attention with dynamic weight modulation.<n> Experiments on MRI, CT, and endoscopy images show that SAMA-UNet performs better in segmentation accuracy than current methods.
- Score: 4.790894013065453
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Medical image segmentation plays an important role in various clinical applications, but existing models often struggle with the computational inefficiencies and challenges posed by complex medical data. State Space Sequence Models (SSMs) have demonstrated promise in modeling long-range dependencies with linear computational complexity, yet their application in medical image segmentation remains hindered by incompatibilities with image tokens and autoregressive assumptions. Moreover, it is difficult to achieve a balance in capturing both local fine-grained information and global semantic dependencies. To address these challenges, we introduce SAMA-UNet, a novel architecture for medical image segmentation. A key innovation is the Self-Adaptive Mamba-like Aggregated Attention (SAMA) block, which integrates contextual self-attention with dynamic weight modulation to prioritise the most relevant features based on local and global contexts. This approach reduces computational complexity and improves the representation of complex image features across multiple scales. We also suggest the Causal-Resonance Multi-Scale Module (CR-MSM), which enhances the flow of information between the encoder and decoder by using causal resonance learning. This mechanism allows the model to automatically adjust feature resolution and causal dependencies across scales, leading to better semantic alignment between the low-level and high-level features in U-shaped architectures. Experiments on MRI, CT, and endoscopy images show that SAMA-UNet performs better in segmentation accuracy than current methods using CNN, Transformer, and Mamba. The implementation is publicly available at GitHub.
Related papers
- MambaCAFU: Hybrid Multi-Scale and Multi-Attention Model with Mamba-Based Fusion for Medical Image Segmentation [11.967890140626716]
We propose a hybrid segmentation architecture featuring a three-branch encoder that integrates CNNs, Transformers, and a Mamba-based Attention Fusion mechanism.<n>A multi-scale attention-based CNN decoder reconstructs fine-grained segmentation maps while preserving contextual consistency.<n>Our approach outperforms state-of-the-art methods in accuracy and generalization, while maintaining comparable computational complexity.
arXiv Detail & Related papers (2025-10-04T11:25:10Z) - SpectMamba: Integrating Frequency and State Space Models for Enhanced Medical Image Detection [11.43227481199105]
We present SpectMamba, the first Mamba-based architecture designed for medical image detection.<n>A key component of SpectMamba is the Hybrid Spatial-Frequency Attention (HSFA) block, which separately learns high- and low-frequency features.<n>We show that SpectMamba achieves state-of-the-art performance while being both effective and efficient across various medical image detection tasks.
arXiv Detail & Related papers (2025-09-01T02:56:45Z) - DeSamba: Decoupled Spectral Adaptive Framework for 3D Multi-Sequence MRI Lesion Classification [0.6749750044497732]
DeSamba is a framework designed to extract decoupled representations and adaptively fuse spatial and spectral features for lesion classification.<n>DeSamba achieves 62.10% Top-1 accuracy, 63.62% F1-score, 87.71% AUC, and 93.55% Top-3 accuracy on an external validation set.
arXiv Detail & Related papers (2025-07-21T10:42:21Z) - EAGLE: An Efficient Global Attention Lesion Segmentation Model for Hepatic Echinococcosis [31.698319244945793]
We propose a U-shaped network composed of a Progressive Visual State Space (PVSS) encoder and a Hybrid Visual State Space (HVSS) decoder.<n>The network achieves state-of-the-art performance with a Dice Similarity Coefficient (DSC) of 89.76%, surpassing MSVM-UNet by 1.61%.
arXiv Detail & Related papers (2025-06-25T11:42:05Z) - MRGen: Segmentation Data Engine for Underrepresented MRI Modalities [59.61465292965639]
Training medical image segmentation models for rare yet clinically important imaging modalities is challenging due to the scarcity of annotated data.<n>This paper investigates leveraging generative models to synthesize data, for training segmentation models for underrepresented modalities.<n>We present MRGen, a data engine for controllable medical image synthesis conditioned on text prompts and segmentation masks.
arXiv Detail & Related papers (2024-12-04T16:34:22Z) - MLLA-UNet: Mamba-like Linear Attention in an Efficient U-Shape Model for Medical Image Segmentation [6.578088710294546]
Traditional segmentation methods struggle to address challenges such as high anatomical variability, blurred tissue boundaries, low organ contrast, and noise.
We propose MLLA-UNet (Mamba-Like Linear Attention UNet), a novel architecture that achieves linear computational complexity while maintaining high segmentation accuracy.
Experiments demonstrate that MLLA-UNet achieves state-of-the-art performance on six challenging datasets with 24 different segmentation tasks, including but not limited to FLARE22, AMOS CT, and ACDC, with an average DSC of 88.32%.
arXiv Detail & Related papers (2024-10-31T08:54:23Z) - A Unified Model for Compressed Sensing MRI Across Undersampling Patterns [69.19631302047569]
We propose a unified MRI reconstruction model robust to various measurement undersampling patterns and image resolutions.<n>Our model improves SSIM by 11% and PSNR by 4 dB over a state-of-the-art CNN (End-to-End VarNet) with 600$times$ faster inference than diffusion methods.
arXiv Detail & Related papers (2024-10-05T20:03:57Z) - MSVM-UNet: Multi-Scale Vision Mamba UNet for Medical Image Segmentation [3.64388407705261]
We propose a Multi-Scale Vision Mamba UNet model for medical image segmentation, termed MSVM-UNet.
Specifically, by introducing multi-scale convolutions in the VSS blocks, we can more effectively capture and aggregate multi-scale feature representations from the hierarchical features of the VMamba encoder.
arXiv Detail & Related papers (2024-08-25T06:20:28Z) - Cross-Scan Mamba with Masked Training for Robust Spectral Imaging [51.557804095896174]
We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding.<n>Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
arXiv Detail & Related papers (2024-08-01T15:14:10Z) - Advancing Medical Image Segmentation: Morphology-Driven Learning with Diffusion Transformer [4.672688418357066]
We propose a novel Transformer Diffusion (DTS) model for robust segmentation in the presence of noise.
Our model, which analyzes the morphological representation of images, shows better results than the previous models in various medical imaging modalities.
arXiv Detail & Related papers (2024-08-01T07:35:54Z) - Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment [0.0]
We introduce Mamba-Ahnet, a novel integration of State Space Model (SSM) and Advanced Hierarchical Network (AHNet) within the MAMBA framework.
Mamba-Ahnet combines SSM's feature extraction and comprehension with AHNet's attention mechanisms and image reconstruction, aiming to enhance segmentation accuracy and robustness.
arXiv Detail & Related papers (2024-04-26T08:15:43Z) - VM-UNet: Vision Mamba UNet for Medical Image Segmentation [2.3876474175791302]
We propose a U-shape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet)
We conduct comprehensive experiments on the ISIC17, ISIC18, and Synapse datasets, and the results indicate that VM-UNet performs competitively in medical image segmentation tasks.
arXiv Detail & Related papers (2024-02-04T13:37:21Z) - PMFSNet: Polarized Multi-scale Feature Self-attention Network For
Lightweight Medical Image Segmentation [6.134314911212846]
Current state-of-the-art medical image segmentation methods prioritize accuracy but often at the expense of increased computational demands and larger model sizes.
We propose PMFSNet, a novel medical imaging segmentation model that balances global local feature processing while avoiding computational redundancy.
It incorporates a plug-and-play PMFS block, a multi-scale feature enhancement module based on attention mechanisms, to capture long-term dependencies.
arXiv Detail & Related papers (2024-01-15T10:26:47Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing.
Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery.
We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z) - Self-Supervised Neuron Segmentation with Multi-Agent Reinforcement
Learning [53.00683059396803]
Mask image model (MIM) has been widely used due to its simplicity and effectiveness in recovering original information from masked images.
We propose a decision-based MIM that utilizes reinforcement learning (RL) to automatically search for optimal image masking ratio and masking strategy.
Our approach has a significant advantage over alternative self-supervised methods on the task of neuron segmentation.
arXiv Detail & Related papers (2023-10-06T10:40:46Z) - Breast Ultrasound Tumor Classification Using a Hybrid Multitask
CNN-Transformer Network [63.845552349914186]
Capturing global contextual information plays a critical role in breast ultrasound (BUS) image classification.
Vision Transformers have an improved capability of capturing global contextual information but may distort the local image patterns due to the tokenization operations.
In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation.
arXiv Detail & Related papers (2023-08-04T01:19:32Z) - Scale-aware Test-time Click Adaptation for Pulmonary Nodule and Mass
Segmentation [35.381677272157866]
Pulmonary nodules and masses are crucial imaging features in lung cancer screening.
Despite the success of deep learning-based medical image segmentation, the robust performance on various sizes of lesions is still challenging.
We propose a multi-scale neural network with scale-aware test-time adaptation to address this challenge.
arXiv Detail & Related papers (2023-07-28T16:04:34Z) - 3D Medical Image Segmentation based on multi-scale MPU-Net [5.393743755706745]
This paper proposes a tumor segmentation model MPU-Net for patient volume CT images.
It is inspired by Transformer with a global attention mechanism.
Compared with the benchmark model U-Net, MPU-Net shows excellent segmentation results.
arXiv Detail & Related papers (2023-07-11T20:46:19Z) - On Sensitivity and Robustness of Normalization Schemes to Input
Distribution Shifts in Automatic MR Image Diagnosis [58.634791552376235]
Deep Learning (DL) models have achieved state-of-the-art performance in diagnosing multiple diseases using reconstructed images as input.
DL models are sensitive to varying artifacts as it leads to changes in the input data distribution between the training and testing phases.
We propose to use other normalization techniques, such as Group Normalization and Layer Normalization, to inject robustness into model performance against varying image artifacts.
arXiv Detail & Related papers (2023-06-23T03:09:03Z) - Site Generalization: Stroke Lesion Segmentation on Magnetic Resonance
Images from Unseen Sites [40.32385363670918]
Strokes are the main cause of various cerebrovascular diseases.
Deep learning-based models have been proposed for this task, but generalizing these models to unseen sites is difficult.
We propose a U-net--based segmentation network termed SG-Net to improve unseen site generalization for stroke lesion segmentation on MR images.
arXiv Detail & Related papers (2022-05-09T14:33:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.