Decoding with Structured Awareness: Integrating Directional, Frequency-Spatial, and Structural Attention for Medical Image Segmentation
- URL: http://arxiv.org/abs/2512.05494v1
- Date: Fri, 05 Dec 2025 07:39:14 GMT
- Title: Decoding with Structured Awareness: Integrating Directional, Frequency-Spatial, and Structural Attention for Medical Image Segmentation
- Authors: Fan Zhang, Zhiwei Gu, Hua Wang,
- Abstract summary: This paper proposes a novel decoder framework specifically designed for medical image segmentation, comprising three core modules.<n>First, the Adaptive Cross-Fusion Attention (ACFA) module integrates channel feature enhancement with spatial attention mechanisms to enhance responsiveness to key regions and structural orientations.<n>Second, the Triple Feature Fusion Attention (TFFA) module fuses features from Spatial, Fourier and Wavelet domains, achieving joint frequency-spatial representation while preserving local information such as edges and textures.<n>Third, the Structural-aware Multi-scale Masking Module (SMMM) optimize the skip connections between encoder and decoder by leveraging multi-scale context and structural s
- Score: 6.82200201381917
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To address the limitations of Transformer decoders in capturing edge details, recognizing local textures and modeling spatial continuity, this paper proposes a novel decoder framework specifically designed for medical image segmentation, comprising three core modules. First, the Adaptive Cross-Fusion Attention (ACFA) module integrates channel feature enhancement with spatial attention mechanisms and introduces learnable guidance in three directions (planar, horizontal, and vertical) to enhance responsiveness to key regions and structural orientations. Second, the Triple Feature Fusion Attention (TFFA) module fuses features from Spatial, Fourier and Wavelet domains, achieving joint frequency-spatial representation that strengthens global dependency and structural modeling while preserving local information such as edges and textures, making it particularly effective in complex and blurred boundary scenarios. Finally, the Structural-aware Multi-scale Masking Module (SMMM) optimizes the skip connections between encoder and decoder by leveraging multi-scale context and structural saliency filtering, effectively reducing feature redundancy and improving semantic interaction quality. Working synergistically, these modules not only address the shortcomings of traditional decoders but also significantly enhance performance in high-precision tasks such as tumor segmentation and organ boundary extraction, improving both segmentation accuracy and model generalization. Experimental results demonstrate that this framework provides an efficient and practical solution for medical image segmentation.
Related papers
- TGC-Net: A Structure-Aware and Semantically-Aligned Framework for Text-Guided Medical Image Segmentation [56.09179939570486]
We propose TGC-Net, a CLIP-based framework focusing on parameter-efficient, task-specific adaptations.<n>TGC-Net achieves state-of-the-art performance with substantially fewer trainable parameters, including notable Dice gains on challenging benchmarks.
arXiv Detail & Related papers (2025-12-24T12:06:26Z) - Bridging spatial awareness and global context in medical image segmentation [2.750124853532831]
We propose U-CycleMLP, a novel U-shaped encoder-decoder network designed to enhance segmentation performance.<n>The encoder learns multiscale contextual features using position attention weight blocks, dense atrous blocks, and downsampling operations.<n>The decoder reconstructs high-resolution segmentation masks through upsampling operations, dense atrous blocks, and feature fusion mechanisms.
arXiv Detail & Related papers (2025-12-06T20:25:24Z) - Frequency-Domain Decomposition and Recomposition for Robust Audio-Visual Segmentation [60.9960601057956]
We introduce Frequency-Aware Audio-Visualcomposer (FAVS) framework consisting of two key modules.<n>FAVS framework achieves state-of-the-art performance on three benchmark datasets.
arXiv Detail & Related papers (2025-09-23T12:33:48Z) - SCRNet: Spatial-Channel Regulation Network for Medical Ultrasound Image Segmentation [1.4930126157970809]
CNN-based methods tend to disregard long-range dependencies, while Transformer-based methods may overlook local contextual information.<n>We propose a novel Feature Aggregation Module (FAM) designed to process two input features from the preceding layer.<n>This strategy enables our module to focus concurrently on both long-range dependencies and local contextual information.
arXiv Detail & Related papers (2025-08-19T15:02:27Z) - GCRPNet: Graph-Enhanced Contextual and Regional Perception Network for Salient Object Detection in Optical Remote Sensing Images [68.33481681452675]
We propose a graph-enhanced contextual and regional perception network (GCRPNet)<n>It builds upon the Mamba architecture to simultaneously capture long-range dependencies and enhance regional feature representation.<n>It performs adaptive patch scanning on feature maps processed via multi-scale convolutions, thereby capturing rich local region information.
arXiv Detail & Related papers (2025-08-14T11:31:43Z) - RL-U$^2$Net: A Dual-Branch UNet with Reinforcement Learning-Assisted Multimodal Feature Fusion for Accurate 3D Whole-Heart Segmentation [0.624829068285122]
We propose a dual-branch U-Net architecture enhanced by reinforcement learning for feature alignment.<n>The model employs a dual-branch U-shaped network to process CT and MRI patches in parallel, and introduces a novel RL-XAlign module.<n> Experimental results on the publicly available MM-WHS 2017 dataset demonstrate that the proposed RL-U$2$Net outperforms existing state-of-the-art methods.
arXiv Detail & Related papers (2025-08-04T16:12:06Z) - CFMD: Dynamic Cross-layer Feature Fusion for Salient Object Detection [7.262250906929891]
Cross-layer feature pyramid networks (CFPNs) have achieved notable progress in multi-scale feature fusion and boundary detail preservation for salient object detection.<n>To address these challenges, we propose CFMD, a novel cross-layer feature pyramid network that introduces two key innovations.<n>First, we design a context-aware feature aggregation module (CFLMA), which incorporates the state-of-the-art Mamba architecture to construct a dynamic weight distribution mechanism.<n>Second, we introduce an adaptive dynamic upsampling unit (CFLMD) that preserves spatial details during resolution recovery.
arXiv Detail & Related papers (2025-04-02T03:22:36Z) - AFFSegNet: Adaptive Feature Fusion Segmentation Network for Microtumors and Multi-Organ Segmentation [31.97835089989928]
Medical image segmentation is a crucial task in computer vision, supporting clinicians in diagnosis, treatment planning, and disease monitoring.<n>We propose the Adaptive Semantic Network (ASSNet), a transformer architecture that effectively integrates local and global features for precise medical image segmentation.<n>Tests on diverse medical image segmentation tasks, including multi-organ, liver tumor, and bladder tumor segmentation, demonstrate that ASSNet achieves state-of-the-art results.
arXiv Detail & Related papers (2024-09-12T06:25:44Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - Spatio-Temporal Representation Factorization for Video-based Person
Re-Identification [55.01276167336187]
We propose Spatio-Temporal Representation Factorization module (STRF) for re-ID.
STRF is a flexible new computational unit that can be used in conjunction with most existing 3D convolutional neural network architectures for re-ID.
We empirically show that STRF improves performance of various existing baseline architectures while demonstrating new state-of-the-art results.
arXiv Detail & Related papers (2021-07-25T19:29:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.