FaRMamba: Frequency-based learning and Reconstruction aided Mamba for Medical Segmentation
- URL: http://arxiv.org/abs/2507.20056v1
- Date: Sat, 26 Jul 2025 20:41:53 GMT
- Title: FaRMamba: Frequency-based learning and Reconstruction aided Mamba for Medical Segmentation
- Authors: Ze Rong, ZiYue Zhao, Zhaoxin Wang, Lei Ma,
- Abstract summary: Vision Mamba employs one-dimensional causal state-space recurrence to efficiently model global dependencies.<n>Its patch tokenization and 1D serialization disrupt local pixel adjacency and impose a low-pass filtering effect.<n>We propose FaRMamba, a novel extension that explicitly addresses LHICD and 2D-SSD through two complementary modules.
- Score: 3.5790602918760586
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Accurate medical image segmentation remains challenging due to blurred lesion boundaries (LBA), loss of high-frequency details (LHD), and difficulty in modeling long-range anatomical structures (DC-LRSS). Vision Mamba employs one-dimensional causal state-space recurrence to efficiently model global dependencies, thereby substantially mitigating DC-LRSS. However, its patch tokenization and 1D serialization disrupt local pixel adjacency and impose a low-pass filtering effect, resulting in Local High-frequency Information Capture Deficiency (LHICD) and two-dimensional Spatial Structure Degradation (2D-SSD), which in turn exacerbate LBA and LHD. In this work, we propose FaRMamba, a novel extension that explicitly addresses LHICD and 2D-SSD through two complementary modules. A Multi-Scale Frequency Transform Module (MSFM) restores attenuated high-frequency cues by isolating and reconstructing multi-band spectra via wavelet, cosine, and Fourier transforms. A Self-Supervised Reconstruction Auxiliary Encoder (SSRAE) enforces pixel-level reconstruction on the shared Mamba encoder to recover full 2D spatial correlations, enhancing both fine textures and global context. Extensive evaluations on CAMUS echocardiography, MRI-based Mouse-cochlea, and Kvasir-Seg endoscopy demonstrate that FaRMamba consistently outperforms competitive CNN-Transformer hybrids and existing Mamba variants, delivering superior boundary accuracy, detail preservation, and global coherence without prohibitive computational overhead. This work provides a flexible frequency-aware framework for future segmentation models that directly mitigates core challenges in medical imaging.
Related papers
- SAMba-UNet: Synergizing SAM2 and Mamba in UNet with Heterogeneous Aggregation for Cardiac MRI Segmentation [6.451534509235736]
This study proposes an innovative dual-encoder architecture named SAMba-UNet.<n>The framework achieves cross-modal feature collaborative learning by integrating the vision foundation model SAM2, the state-space model Mamba, and the classical UNet.<n> Experiments on the ACDC cardiac MRI dataset demonstrate that the proposed model achieves a Dice coefficient of 0.9103 and an HD95 boundary error of 1.0859 mm.
arXiv Detail & Related papers (2025-05-22T06:57:03Z) - DH-Mamba: Exploring Dual-domain Hierarchical State Space Models for MRI Reconstruction [6.341065683872316]
This paper explores selective state space models (Mamba) for efficient and effective MRI reconstruction.<n>Mamba typically flattens 2D images into distinct 1D sequences along rows and columns, disrupting k-space's unique spectrum.<n>Existing approaches adopt multi-directional lengthy scanning to unfold images at the pixel level, leading to long-range forgetting and high computational burden.
arXiv Detail & Related papers (2025-01-14T14:41:51Z) - Cross-Scan Mamba with Masked Training for Robust Spectral Imaging [51.557804095896174]
We propose the Cross-Scanning Mamba, named CS-Mamba, that employs a Spatial-Spectral SSM for global-local balanced context encoding.<n>Experiment results show that our CS-Mamba achieves state-of-the-art performance and the masked training method can better reconstruct smooth features to improve the visual quality.
arXiv Detail & Related papers (2024-08-01T15:14:10Z) - Enhanced Masked Image Modeling to Avoid Model Collapse on Multi-modal MRI Datasets [6.3467517115551875]
Masked image modeling (MIM) has shown promise in utilizing unlabeled data.<n>We analyze and address model collapse in two types: complete collapse and dimensional collapse.<n>We construct the enhanced MIM (E-MIM) with HMP and PBT module to avoid model collapse multi-modal MRI.
arXiv Detail & Related papers (2024-07-15T01:11:30Z) - MMR-Mamba: Multi-Modal MRI Reconstruction with Mamba and Spatial-Frequency Information Fusion [17.084083262801737]
We propose MMR-Mamba, a novel framework that thoroughly and efficiently integrates multi-modal features for MRI reconstruction.
Specifically, we first design a Target modality-guided Cross Mamba (TCM) module in the spatial domain.
Then, we introduce a Selective Frequency Fusion (SFF) module to efficiently integrate global information in the Fourier domain.
arXiv Detail & Related papers (2024-06-27T07:30:54Z) - Dual Hyperspectral Mamba for Efficient Spectral Compressive Imaging [102.35787741640749]
We propose a novel Dual Hyperspectral Mamba (DHM) to explore both global long-range dependencies and local contexts for efficient HSI reconstruction.
Specifically, our DHM consists of multiple dual hyperspectral S4 blocks (DHSBs) to restore original HSIs.
arXiv Detail & Related papers (2024-06-01T14:14:40Z) - Frequency-Assisted Mamba for Remote Sensing Image Super-Resolution [49.902047563260496]
We develop the first attempt to integrate the Vision State Space Model (Mamba) for remote sensing image (RSI) super-resolution.
To achieve better SR reconstruction, building upon Mamba, we devise a Frequency-assisted Mamba framework, dubbed FMSR.
Our FMSR features a multi-level fusion architecture equipped with the Frequency Selection Module (FSM), Vision State Space Module (VSSM), and Hybrid Gate Module (HGM)
arXiv Detail & Related papers (2024-05-08T11:09:24Z) - Enhancing Retinal Vascular Structure Segmentation in Images With a Novel
Design Two-Path Interactive Fusion Module Model [6.392575673488379]
We introduce Swin-Res-Net, a specialized module designed to enhance the precision of retinal vessel segmentation.
Swin-Res-Net utilizes the Swin transformer which uses shifted windows with displacement for partitioning.
Our proposed architecture produces outstanding results, either meeting or surpassing those of other published models.
arXiv Detail & Related papers (2024-03-03T01:36:11Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - Cross-Modal Causal Intervention for Medical Report Generation [107.76649943399168]
Radiology Report Generation (RRG) is essential for computer-aided diagnosis and medication guidance.<n> generating accurate lesion descriptions remains challenging due to spurious correlations from visual-linguistic biases.<n>We propose a two-stage framework named CrossModal Causal Representation Learning (CMCRL)<n> Experiments on IU-Xray and MIMIC-CXR show that our CMCRL pipeline significantly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-03-16T07:23:55Z) - AliasNet: Alias Artefact Suppression Network for Accelerated
Phase-Encode MRI [4.752084030395196]
Sparse reconstruction is an important aspect of MRI, helping to reduce acquisition time and improve spatial-temporal resolution.
Experiments conducted on retrospectively under-sampled brain and knee data demonstrate that combination of the proposed 1D AliasNet modules with existing 2D deep learned (DL) recovery techniques leads to an improvement in image quality.
arXiv Detail & Related papers (2023-02-17T13:16:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.