ScaleFusionNet: Transformer-Guided Multi-Scale Feature Fusion for Skin Lesion Segmentation
- URL: http://arxiv.org/abs/2503.03327v2
- Date: Wed, 30 Apr 2025 06:10:54 GMT
- Title: ScaleFusionNet: Transformer-Guided Multi-Scale Feature Fusion for Skin Lesion Segmentation
- Authors: Saqib Qamar, Syed Furqan Qadri, Roobaea Alroobaea, Goram Mufarah M Alshmrani, Richard Jiang,
- Abstract summary: Melanoma is a malignant tumor originating from skin cell lesions.<n>We propose ScaleFusionNet, a segmentation model that integrates Cross-Attention Transformer Module (CATM) and AdaptiveFusionBlock.<n>The model employs a hybrid architecture encoder that effectively captures both local and global features.
- Score: 1.6361082730202214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Melanoma is a malignant tumor originating from skin cell lesions. Accurate and efficient segmentation of skin lesions is essential for quantitative medical analysis but remains challenging. To address this, we propose ScaleFusionNet, a segmentation model that integrates Cross-Attention Transformer Module (CATM) and AdaptiveFusionBlock to enhance feature extraction and fusion. The model employs a hybrid architecture encoder that effectively captures both local and global features. We introduce CATM, which utilizes Swin Transformer Blocks and Cross Attention Fusion (CAF) to adaptively refine encoder-decoder feature fusion, reducing semantic gaps and improving segmentation accuracy. Additionally, the AdaptiveFusionBlock is improved by integrating adaptive multi-scale fusion, where Swin Transformer-based attention complements deformable convolution-based multi-scale feature extraction. This enhancement refines lesion boundaries and preserves fine-grained details. ScaleFusionNet achieves Dice scores of 92.94% and 91.65% on ISIC-2016 and ISIC-2018 datasets, respectively, demonstrating its effectiveness in skin lesion analysis. Our code implementation is publicly available at GitHub.
Related papers
- GLoG-CSUnet: Enhancing Vision Transformers with Adaptable Radiomic Features for Medical Image Segmentation [2.294915015129229]
Vision Transformers (ViTs) have shown promise in medical image semantic segmentation (MISS)<n>We introduce Gabor and Laplacian of Gaussian Convolutional Swin Network (GLoG-CSUnet)<n>GLoG-CSUnet is a novel architecture enhancing Transformer-based models by incorporating learnable radiomic features.
arXiv Detail & Related papers (2025-01-06T06:07:40Z) - FIAS: Feature Imbalance-Aware Medical Image Segmentation with Dynamic Fusion and Mixing Attention [11.385231493066312]
hybrid architecture that combine convolutional neural networks (CNNs) and transformers demonstrates competitive ability in medical image segmentation.
We propose a Feaure Imbalance-Aware (FIAS) network, which incorporates a dual-path encoder and a novel Mixing Attention (MixAtt) decoder.
arXiv Detail & Related papers (2024-11-16T20:30:44Z) - AFFSegNet: Adaptive Feature Fusion Segmentation Network for Microtumors and Multi-Organ Segmentation [31.97835089989928]
Medical image segmentation is a crucial task in computer vision, supporting clinicians in diagnosis, treatment planning, and disease monitoring.<n>We propose the Adaptive Semantic Network (ASSNet), a transformer architecture that effectively integrates local and global features for precise medical image segmentation.<n>Tests on diverse medical image segmentation tasks, including multi-organ, liver tumor, and bladder tumor segmentation, demonstrate that ASSNet achieves state-of-the-art results.
arXiv Detail & Related papers (2024-09-12T06:25:44Z) - Prototype Learning Guided Hybrid Network for Breast Tumor Segmentation in DCE-MRI [58.809276442508256]
We propose a hybrid network via the combination of convolution neural network (CNN) and transformer layers.
The experimental results on private and public DCE-MRI datasets demonstrate that the proposed hybrid network superior performance than the state-of-the-art methods.
arXiv Detail & Related papers (2024-08-11T15:46:00Z) - Dual-scale Enhanced and Cross-generative Consistency Learning for Semi-supervised Medical Image Segmentation [49.57907601086494]
Medical image segmentation plays a crucial role in computer-aided diagnosis.
We propose a novel Dual-scale Enhanced and Cross-generative consistency learning framework for semi-supervised medical image (DEC-Seg)
arXiv Detail & Related papers (2023-12-26T12:56:31Z) - Inter-Scale Dependency Modeling for Skin Lesion Segmentation with
Transformer-based Networks [0.0]
Melanoma is a dangerous form of skin cancer caused by the abnormal growth of skin cells.
FCN approaches, including the U-Net architecture, can automatically segment skin lesions to aid diagnosis.
The symmetrical U-Net model has shown outstanding results, but its use of a convolutional operation limits its ability to capture long-range dependencies.
arXiv Detail & Related papers (2023-10-20T16:20:25Z) - Skin Lesion Segmentation Improved by Transformer-based Networks with
Inter-scale Dependency Modeling [0.0]
Melanoma is a dangerous type of skin cancer resulting from abnormal skin cell growth.
The symmetrical U-Net model's reliance on convolutional operations hinders its ability to capture long-range dependencies.
Several Transformer-based U-Net topologies have recently been created to overcome this limitation.
arXiv Detail & Related papers (2023-10-20T15:53:51Z) - MISSU: 3D Medical Image Segmentation via Self-distilling TransUNet [55.16833099336073]
We propose to self-distill a Transformer-based UNet for medical image segmentation.
It simultaneously learns global semantic information and local spatial-detailed features.
Our MISSU achieves the best performance over previous state-of-the-art methods.
arXiv Detail & Related papers (2022-06-02T07:38:53Z) - nnFormer: Interleaved Transformer for Volumetric Segmentation [50.10441845967601]
We introduce nnFormer, a powerful segmentation model with an interleaved architecture based on empirical combination of self-attention and convolution.
nnFormer achieves tremendous improvements over previous transformer-based methods on two commonly used datasets Synapse and ACDC.
arXiv Detail & Related papers (2021-09-07T17:08:24Z) - Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation [63.46694853953092]
Swin-Unet is an Unet-like pure Transformer for medical image segmentation.
tokenized image patches are fed into the Transformer-based U-shaped decoder-Decoder architecture.
arXiv Detail & Related papers (2021-05-12T09:30:26Z) - TransUNet: Transformers Make Strong Encoders for Medical Image
Segmentation [78.01570371790669]
Medical image segmentation is an essential prerequisite for developing healthcare systems.
On various medical image segmentation tasks, the u-shaped architecture, also known as U-Net, has become the de-facto standard.
We propose TransUNet, which merits both Transformers and U-Net, as a strong alternative for medical image segmentation.
arXiv Detail & Related papers (2021-02-08T16:10:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.