Related papers: MLLA-UNet: Mamba-like Linear Attention in an Efficient U-Shape Model for Medical Image Segmentation

MLLA-UNet: Mamba-like Linear Attention in an Efficient U-Shape Model for Medical Image Segmentation

URL: http://arxiv.org/abs/2410.23738v1
Date: Thu, 31 Oct 2024 08:54:23 GMT
Title: MLLA-UNet: Mamba-like Linear Attention in an Efficient U-Shape Model for Medical Image Segmentation
Authors: Yufeng Jiang, Zongxi Li, Xiangyan Chen, Haoran Xie, Jing Cai,
Abstract summary: Traditional segmentation methods struggle to address challenges such as high anatomical variability, blurred tissue boundaries, low organ contrast, and noise. We propose MLLA-UNet (Mamba-Like Linear Attention UNet), a novel architecture that achieves linear computational complexity while maintaining high segmentation accuracy. Experiments demonstrate that MLLA-UNet achieves state-of-the-art performance on six challenging datasets with 24 different segmentation tasks, including but not limited to FLARE22, AMOS CT, and ACDC, with an average DSC of 88.32%.
Score: 6.578088710294546
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advancements in medical imaging have resulted in more complex and diverse images, with challenges such as high anatomical variability, blurred tissue boundaries, low organ contrast, and noise. Traditional segmentation methods struggle to address these challenges, making deep learning approaches, particularly U-shaped architectures, increasingly prominent. However, the quadratic complexity of standard self-attention makes Transformers computationally prohibitive for high-resolution images. To address these challenges, we propose MLLA-UNet (Mamba-Like Linear Attention UNet), a novel architecture that achieves linear computational complexity while maintaining high segmentation accuracy through its innovative combination of linear attention and Mamba-inspired adaptive mechanisms, complemented by an efficient symmetric sampling structure for enhanced feature processing. Our architecture effectively preserves essential spatial features while capturing long-range dependencies at reduced computational complexity. Additionally, we introduce a novel sampling strategy for multi-scale feature fusion. Experiments demonstrate that MLLA-UNet achieves state-of-the-art performance on six challenging datasets with 24 different segmentation tasks, including but not limited to FLARE22, AMOS CT, and ACDC, with an average DSC of 88.32%. These results underscore the superiority of MLLA-UNet over existing methods. Our contributions include the novel 2D segmentation architecture and its empirical validation. The code is available via https://github.com/csyfjiang/MLLA-UNet.

Related papers

MambaMIL+: Modeling Long-Term Contextual Patterns for Gigapixel Whole Slide Image [24.093388981091717]
Multiple instance learning (MIL) offers a solution by treating each WSI as a bag of patch-level instances.<n>Mamba has emerged as a promising alternative for long sequence learning, scaling linearly to thousands of tokens.<n>We propose MambaMIL+, a new MIL framework that explicitly integrates spatial context while maintaining long-range dependency modeling.
arXiv Detail & Related papers (2025-12-19T16:01:14Z)
HyM-UNet: Synergizing Local Texture and Global Context via Hybrid CNN-Mamba Architecture for Medical Image Segmentation [3.976000861085382]
HyM-UNet is designed to synergize the local feature extraction capabilities of CNNs with the efficient global modeling capabilities of Mamba.<n>To bridge the semantic gap between the encoder and the decoder, we propose a Mamba-Guided Fusion Skip Connection.<n>The results demonstrate that HyM-UNet significantly outperforms existing state-of-the-art methods in terms of Dice coefficient and IoU.
arXiv Detail & Related papers (2025-11-22T09:02:06Z)
MambaCAFU: Hybrid Multi-Scale and Multi-Attention Model with Mamba-Based Fusion for Medical Image Segmentation [11.967890140626716]
We propose a hybrid segmentation architecture featuring a three-branch encoder that integrates CNNs, Transformers, and a Mamba-based Attention Fusion mechanism.<n>A multi-scale attention-based CNN decoder reconstructs fine-grained segmentation maps while preserving contextual consistency.<n>Our approach outperforms state-of-the-art methods in accuracy and generalization, while maintaining comparable computational complexity.
arXiv Detail & Related papers (2025-10-04T11:25:10Z)
MambaOutRS: A Hybrid CNN-Fourier Architecture for Remote Sensing Image Classification [4.14360329494344]
We introduce MambaOutRS, a novel hybrid convolutional architecture for remote sensing image classification.<n>MambaOutRS builds upon stacked Gated CNN blocks for local feature extraction and introduces a novel Fourier Filter Gate (FFG) module.
arXiv Detail & Related papers (2025-06-24T12:20:11Z)
InceptionMamba: Efficient Multi-Stage Feature Enhancement with Selective State Space Model for Microscopic Medical Image Segmentation [15.666926528144202]
We propose an efficient framework for the segmentation task, named InceptionMamba, which encodes multi-stage rich features.<n>We exploit semantic cues to capture both low-frequency and high-frequency regions to enrich the multi-stage features.<n>Our model achieves state-of-the-art performance on two challenging microscopic segmentation datasets.
arXiv Detail & Related papers (2025-06-13T20:25:12Z)
SAMA-UNet: Enhancing Medical Image Segmentation with Self-Adaptive Mamba-Like Attention and Causal-Resonance Learning [4.790894013065453]
We introduce SAMA-UNet, a novel architecture for medical image segmentation.<n>A key innovation is the Self-Adaptive Mamba-like Aggregated Attention (SAMA) block, which integrates contextual self-attention with dynamic weight modulation.<n> Experiments on MRI, CT, and endoscopy images show that SAMA-UNet performs better in segmentation accuracy than current methods.
arXiv Detail & Related papers (2025-05-21T08:12:31Z)
MambaStyle: Efficient StyleGAN Inversion for Real Image Editing with State-Space Models [60.110274007388135]
MambaStyle is an efficient single-stage encoder-based approach for GAN inversion and editing.<n>We show that MambaStyle achieves a superior balance among inversion accuracy, editing quality, and computational efficiency.
arXiv Detail & Related papers (2025-05-06T20:03:47Z)
An Efficient and Mixed Heterogeneous Model for Image Restoration [71.85124734060665]
Current mainstream approaches are based on three architectural paradigms: CNNs, Transformers, and Mambas. We propose RestorMixer, an efficient and general-purpose IR model based on mixed-architecture fusion.
arXiv Detail & Related papers (2025-04-15T08:19:12Z)
Detail Matters: Mamba-Inspired Joint Unfolding Network for Snapshot Spectral Compressive Imaging [40.80197280147993]
We propose a Mamba-inspired Joint Unfolding Network (MiJUN) to overcome the inherent nonlinear and ill-posed characteristics of HSI reconstruction. We introduce an accelerated unfolding network scheme, which reduces the reliance on initial optimization stages. We refine the scanning strategy with Mamba by integrating the tensor mode-$k$ unfolding into the Mamba network.
arXiv Detail & Related papers (2025-01-02T13:56:23Z)
Mamba-SEUNet: Mamba UNet for Monaural Speech Enhancement [54.427965535613886]
Mamba, as a novel state-space model (SSM), has gained widespread application in natural language processing and computer vision. In this work, we introduce Mamba-SEUNet, an innovative architecture that integrates Mamba with U-Net for SE tasks.
arXiv Detail & Related papers (2024-12-21T13:43:51Z)
LAMA: Stable Dual-Domain Deep Reconstruction For Sparse-View CT [4.573246328161056]
We develop a Learned Alternating Minimization Algorithm (LAMA) to solve problems via two-block optimization. LAMA is naturally induced as a variational model with learnable regularizers in both data and image domains. It is demonstrated that LAMA reduces network complexity, memory efficiency, and reconstruction accuracy, stability, and interpretability.
arXiv Detail & Related papers (2024-10-28T15:13:04Z)
HRVMamba: High-Resolution Visual State Space Model for Dense Prediction [60.80423207808076]
State Space Models (SSMs) with efficient hardware-aware designs have demonstrated significant potential in computer vision tasks. These models have been constrained by three key challenges: insufficient inductive bias, long-range forgetting, and low-resolution output representation. We introduce the Dynamic Visual State Space (DVSS) block, which employs deformable convolution to mitigate the long-range forgetting problem. We also introduce High-Resolution Visual State Space Model (HRVMamba) based on the DVSS block, which preserves high-resolution representations throughout the entire process.
arXiv Detail & Related papers (2024-10-04T06:19:29Z)
MambaClinix: Hierarchical Gated Convolution and Mamba-Based U-Net for Enhanced 3D Medical Image Segmentation [6.673169053236727]
We propose MambaClinix, a novel U-shaped architecture for medical image segmentation. MambaClinix integrates a hierarchical gated convolutional network with Mamba in an adaptive stage-wise framework. Our results show that MambaClinix achieves high segmentation accuracy while maintaining low model complexity.
arXiv Detail & Related papers (2024-09-19T07:51:14Z)
PPMamba: A Pyramid Pooling Local Auxiliary SSM-Based Model for Remote Sensing Image Semantic Segmentation [1.5136939451642137]
This paper proposes a novel network called Pyramid Pooling Mamba (PPMamba), which integrates CNN and Mamba for semantic segmentation tasks. PPMamba achieves competitive performance compared to state-of-the-art models.
arXiv Detail & Related papers (2024-09-10T08:08:50Z)
Bidirectional Gated Mamba for Sequential Recommendation [56.85338055215429]
Mamba, a recent advancement, has exhibited exceptional performance in time series prediction. We introduce a new framework named Selective Gated Mamba ( SIGMA) for Sequential Recommendation. Our results indicate that SIGMA outperforms current models on five real-world datasets.
arXiv Detail & Related papers (2024-08-21T09:12:59Z)
Cross-Scan Mamba with Masked Training for Robust Spectral Imaging [51.557804095896174]
We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding. Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
arXiv Detail & Related papers (2024-08-01T15:14:10Z)
Self-Prior Guided Mamba-UNet Networks for Medical Image Super-Resolution [7.97504951029884]
We propose a self-prior guided Mamba-UNet network (SMamba-UNet) for medical image super-resolution. Inspired by Mamba, our approach aims to learn the self-prior multi-scale contextual features under Mamba-UNet networks.
arXiv Detail & Related papers (2024-07-08T14:41:53Z)
Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification [4.389334324926174]
This study introduces the innovative Mamba-in-Mamba (MiM) architecture for HSI classification, the first attempt of deploying State Space Model (SSM) in this task. MiM model includes 1) A novel centralized Mamba-Cross-Scan (MCS) mechanism for transforming images into sequence-data, 2) A Tokenized Mamba (T-Mamba) encoder, and 3) A Weighted MCS Fusion (WMF) module. Experimental results from three public HSI datasets demonstrate that our method outperforms existing baselines and state-of-the-art approaches.
arXiv Detail & Related papers (2024-05-20T13:19:02Z)
IRSRMamba: Infrared Image Super-Resolution via Mamba-based Wavelet Transform Feature Modulation Model [7.842507196763463]
IRSRMamba is a novel framework integrating wavelet transform feature modulation for multi-scale adaptation. IRSRMamba outperforms state-of-the-art methods in PSNR, SSIM, and perceptual quality. This work establishes Mamba-based architectures as a promising direction for high-fidelity IR image enhancement.
arXiv Detail & Related papers (2024-05-16T07:49:24Z)
Real-Time Image Segmentation via Hybrid Convolutional-Transformer Architecture Search [49.81353382211113]
We address the challenge of integrating multi-head self-attention into high resolution representation CNNs efficiently. We develop a multi-target multi-branch supernet method, which fully utilizes the advantages of high-resolution features. We present a series of model via Hybrid Convolutional-Transformer Architecture Search (HyCTAS) method that searched for the best hybrid combination of light-weight convolution layers and memory-efficient self-attention layers.
arXiv Detail & Related papers (2024-03-15T15:47:54Z)
MamMIL: Multiple Instance Learning for Whole Slide Images with State Space Models [56.37780601189795]
We propose a framework named MamMIL for WSI analysis. We represent each WSI as an undirected graph. To address the problem that Mamba can only process 1D sequences, we propose a topology-aware scanning mechanism.
arXiv Detail & Related papers (2024-03-08T09:02:13Z)
Rotated Multi-Scale Interaction Network for Referring Remote Sensing Image Segmentation [63.15257949821558]
Referring Remote Sensing Image (RRSIS) is a new challenge that combines computer vision and natural language processing. Traditional Referring Image (RIS) approaches have been impeded by the complex spatial scales and orientations found in aerial imagery. We introduce the Rotated Multi-Scale Interaction Network (RMSIN), an innovative approach designed for the unique demands of RRSIS.
arXiv Detail & Related papers (2023-12-19T08:14:14Z)
Distance Weighted Trans Network for Image Completion [52.318730994423106]
We propose a new architecture that relies on Distance-based Weighted Transformer (DWT) to better understand the relationships between an image's components. CNNs are used to augment the local texture information of coarse priors. DWT blocks are used to recover certain coarse textures and coherent visual structures.
arXiv Detail & Related papers (2023-10-11T12:46:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.