U-DFA: A Unified DINOv2-Unet with Dual Fusion Attention for Multi-Dataset Medical Segmentation
- URL: http://arxiv.org/abs/2510.00585v1
- Date: Wed, 01 Oct 2025 07:06:49 GMT
- Title: U-DFA: A Unified DINOv2-Unet with Dual Fusion Attention for Multi-Dataset Medical Segmentation
- Authors: Zulkaif Sajjad, Furqan Shaukat, Junaid Mir,
- Abstract summary: We propose U-DFA, a unified DINOv2-Unet encoder-decoder architecture that integrates a novel Local-Global Fusion Adapter (LGFA) to enhance segmentation performance.<n>Our method achieves state-of-the-art performance on the Synapse and ACDC datasets with only 33% of the trainable model parameters.
- Score: 1.1724961392643483
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurate medical image segmentation plays a crucial role in overall diagnosis and is one of the most essential tasks in the diagnostic pipeline. CNN-based models, despite their extensive use, suffer from a local receptive field and fail to capture the global context. A common approach that combines CNNs with transformers attempts to bridge this gap but fails to effectively fuse the local and global features. With the recent emergence of VLMs and foundation models, they have been adapted for downstream medical imaging tasks; however, they suffer from an inherent domain gap and high computational cost. To this end, we propose U-DFA, a unified DINOv2-Unet encoder-decoder architecture that integrates a novel Local-Global Fusion Adapter (LGFA) to enhance segmentation performance. LGFA modules inject spatial features from a CNN-based Spatial Pattern Adapter (SPA) module into frozen DINOv2 blocks at multiple stages, enabling effective fusion of high-level semantic and spatial features. Our method achieves state-of-the-art performance on the Synapse and ACDC datasets with only 33\% of the trainable model parameters. These results demonstrate that U-DFA is a robust and scalable framework for medical image segmentation across multiple modalities.
Related papers
- A Semantic Segmentation Algorithm for Pleural Effusion Based on DBIF-AUNet [22.657295396752023]
Pleural effusion semantic segmentation can significantly enhance the accuracy and timeliness of clinical diagnosis and treatment.<n>Existing methods often struggle with diverse image variations and complex edges.<n>We propose the Dual-Branch Interactive Fusion Attention model (DBIF-AUNet) to address these challenges.
arXiv Detail & Related papers (2025-08-08T10:14:51Z) - CFFormer: Cross CNN-Transformer Channel Attention and Spatial Feature Fusion for Improved Segmentation of Heterogeneous Medical Images [29.68616115427831]
Medical image segmentation plays an important role in computer-aided diagnosis.<n>Due to limitations of medical imaging devices, medical images exhibit significant heterogeneity, posing challenges for segmentation.<n>We propose a hybrid CNN-Transformer model,called CFFormer, which leverages effective channel feature extraction.
arXiv Detail & Related papers (2025-01-07T08:59:20Z) - Prompting Segment Anything Model with Domain-Adaptive Prototype for Generalizable Medical Image Segmentation [49.5901368256326]
We propose a novel Domain-Adaptive Prompt framework for fine-tuning the Segment Anything Model (termed as DAPSAM) in segmenting medical images.
Our DAPSAM achieves state-of-the-art performance on two medical image segmentation tasks with different modalities.
arXiv Detail & Related papers (2024-09-19T07:28:33Z) - AFFSegNet: Adaptive Feature Fusion Segmentation Network for Microtumors and Multi-Organ Segmentation [31.97835089989928]
Medical image segmentation is a crucial task in computer vision, supporting clinicians in diagnosis, treatment planning, and disease monitoring.<n>We propose the Adaptive Semantic Network (ASSNet), a transformer architecture that effectively integrates local and global features for precise medical image segmentation.<n>Tests on diverse medical image segmentation tasks, including multi-organ, liver tumor, and bladder tumor segmentation, demonstrate that ASSNet achieves state-of-the-art results.
arXiv Detail & Related papers (2024-09-12T06:25:44Z) - ASPS: Augmented Segment Anything Model for Polyp Segmentation [77.25557224490075]
The Segment Anything Model (SAM) has introduced unprecedented potential for polyp segmentation.
SAM's Transformer-based structure prioritizes global and low-frequency information.
CFA integrates a trainable CNN encoder branch with a frozen ViT encoder, enabling the integration of domain-specific knowledge.
arXiv Detail & Related papers (2024-06-30T14:55:32Z) - BEFUnet: A Hybrid CNN-Transformer Architecture for Precise Medical Image
Segmentation [0.0]
This paper proposes an innovative U-shaped network called BEFUnet, which enhances the fusion of body and edge information for precise medical image segmentation.
The BEFUnet comprises three main modules, including a novel Local Cross-Attention Feature (LCAF) fusion module, a novel Double-Level Fusion (DLF) module, and dual-branch encoder.
The LCAF module efficiently fuses edge and body features by selectively performing local cross-attention on features that are spatially close between the two modalities.
arXiv Detail & Related papers (2024-02-13T21:03:36Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Affine-Consistent Transformer for Multi-Class Cell Nuclei Detection [76.11864242047074]
We propose a novel Affine-Consistent Transformer (AC-Former), which directly yields a sequence of nucleus positions.
We introduce an Adaptive Affine Transformer (AAT) module, which can automatically learn the key spatial transformations to warp original images for local network training.
Experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art algorithms on various benchmarks.
arXiv Detail & Related papers (2023-10-22T02:27:02Z) - MCPA: Multi-scale Cross Perceptron Attention Network for 2D Medical
Image Segmentation [7.720152925974362]
We propose a 2D medical image segmentation model called Multi-scale Cross Perceptron Attention Network (MCPA)
The MCPA consists of three main components: an encoder, a decoder, and a Cross Perceptron.
We evaluate our proposed MCPA model on several publicly available medical image datasets from different tasks and devices.
arXiv Detail & Related papers (2023-07-27T02:18:12Z) - MedSegDiff-V2: Diffusion based Medical Image Segmentation with
Transformer [53.575573940055335]
We propose a novel Transformer-based Diffusion framework, called MedSegDiff-V2.
We verify its effectiveness on 20 medical image segmentation tasks with different image modalities.
arXiv Detail & Related papers (2023-01-19T03:42:36Z) - Cross-Modality Brain Tumor Segmentation via Bidirectional
Global-to-Local Unsupervised Domain Adaptation [61.01704175938995]
In this paper, we propose a novel Bidirectional Global-to-Local (BiGL) adaptation framework under a UDA scheme.
Specifically, a bidirectional image synthesis and segmentation module is proposed to segment the brain tumor.
The proposed method outperforms several state-of-the-art unsupervised domain adaptation methods by a large margin.
arXiv Detail & Related papers (2021-05-17T10:11:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.