Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation
- URL: http://arxiv.org/abs/2504.03108v1
- Date: Fri, 04 Apr 2025 01:27:43 GMT
- Title: Multi-Granularity Vision Fastformer with Fusion Mechanism for Skin Lesion Segmentation
- Authors: Xuanyu Liu, Huiyun Yao, Jinggui Gao, Zhongyi Guo, Xue Zhang, Yulin Dong,
- Abstract summary: This research aims to optimize the balance between computational costs and long-range dependency modelling.<n>We propose a lightweight U-shape network that utilizes Vision Fastformer with Fusion Mechanism (VFFM-UNet)
- Score: 7.944123371140182
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Background:Convolutional Neural Networks(CNN) and Vision Transformers(ViT) are the main techniques used in Medical image segmentation. However, CNN is limited to local contextual information, and ViT's quadratic complexity results in significant computational costs. At the same time, equipping the model to distinguish lesion boundaries with varying degrees of severity is also a challenge encountered in skin lesion segmentation. Purpose:This research aims to optimize the balance between computational costs and long-range dependency modelling and achieve excellent generalization across lesions with different degrees of severity. Methods:we propose a lightweight U-shape network that utilizes Vision Fastformer with Fusion Mechanism (VFFM-UNet). We inherit the advantages of Fastformer's additive attention mechanism, combining element-wise product and matrix product for comprehensive feature extraction and channel reduction to save computational costs. In order to accurately identify the lesion boundaries with varying degrees of severity, we designed Fusion Mechanism including Multi-Granularity Fusion and Channel Fusion, which can process the feature maps in the granularity and channel levels to obtain different contextual information. Results:Comprehensive experiments on the ISIC2017, ISIC2018 and PH2 datasets demonstrate that VFFM-UNet outperforms existing state-of-the-art models regarding parameter numbers, computational complexity and segmentation performance. In short, compared to MISSFormer, our model achieves superior segmentation performance while reducing parameter and computation costs by 101x and 15x, respectively. Conclusions:Both quantitative and qualitative analyses show that VFFM-UNet sets a new benchmark by reaching an ideal balance between parameter numbers, computational complexity, and segmentation performance compared to existing state-of-the-art models.
Related papers
- TAFM-Net: A Novel Approach to Skin Lesion Segmentation Using Transformer Attention and Focal Modulation [0.0]
We develop TAFM-Net, an innovative model leveraging self-adaptive transformer attention (TA) and focal modulation (FM)
Our model integrates an EfficientNetV2B1 encoder, which employs TA to enhance spatial and channel-related saliency, while a densely connected decoder integrates FM within skip connections.
A novel dynamic loss function amalgamates region and boundary information, guiding effective model training.
arXiv Detail & Related papers (2024-11-26T16:18:48Z) - SurgeryV2: Bridging the Gap Between Model Merging and Multi-Task Learning with Deep Representation Surgery [54.866490321241905]
Model merging-based multitask learning (MTL) offers a promising approach for performing MTL by merging multiple expert models.
In this paper, we examine the merged model's representation distribution and uncover a critical issue of "representation bias"
This bias arises from a significant distribution gap between the representations of the merged and expert models, leading to the suboptimal performance of the merged MTL model.
arXiv Detail & Related papers (2024-10-18T11:49:40Z) - CSWin-UNet: Transformer UNet with Cross-Shaped Windows for Medical Image Segmentation [22.645013853519]
CSWin-UNet is a novel U-shaped segmentation method that incorporates the CSWin self-attention mechanism into the UNet.
Our empirical evaluations on diverse datasets, including synapse multi-organ CT, cardiac MRI, and skin lesions, demonstrate that CSWin-UNet maintains low model complexity while delivering high segmentation accuracy.
arXiv Detail & Related papers (2024-07-25T14:25:17Z) - Neuro-TransUNet: Segmentation of stroke lesion in MRI using transformers [0.6554326244334866]
This study introduces the Neuro-TransUNet framework, which synergizes the U-Net's spatial feature extraction with SwinUNETR's global contextual processing ability.
The proposed Neuro-TransUNet model, trained with the ATLAS v2.0 emphtraining dataset, outperforms existing deep learning algorithms and establishes a new benchmark in stroke lesion segmentation.
arXiv Detail & Related papers (2024-06-10T04:36:21Z) - DA-Flow: Dual Attention Normalizing Flow for Skeleton-based Video Anomaly Detection [52.74152717667157]
We propose a lightweight module called Dual Attention Module (DAM) for capturing cross-dimension interaction relationships in-temporal skeletal data.
It employs the frame attention mechanism to identify the most significant frames and the skeleton attention mechanism to capture broader relationships across fixed partitions with minimal parameters and flops.
arXiv Detail & Related papers (2024-06-05T06:18:03Z) - PMFSNet: Polarized Multi-scale Feature Self-attention Network For
Lightweight Medical Image Segmentation [6.134314911212846]
Current state-of-the-art medical image segmentation methods prioritize accuracy but often at the expense of increased computational demands and larger model sizes.
We propose PMFSNet, a novel medical imaging segmentation model that balances global local feature processing while avoiding computational redundancy.
It incorporates a plug-and-play PMFS block, a multi-scale feature enhancement module based on attention mechanisms, to capture long-term dependencies.
arXiv Detail & Related papers (2024-01-15T10:26:47Z) - Learning Multiscale Consistency for Self-supervised Electron Microscopy
Instance Segmentation [48.267001230607306]
We propose a pretraining framework that enhances multiscale consistency in EM volumes.
Our approach leverages a Siamese network architecture, integrating strong and weak data augmentations.
It effectively captures voxel and feature consistency, showing promise for learning transferable representations for EM analysis.
arXiv Detail & Related papers (2023-08-19T05:49:13Z) - ESSAformer: Efficient Transformer for Hyperspectral Image
Super-resolution [76.7408734079706]
Single hyperspectral image super-resolution (single-HSI-SR) aims to restore a high-resolution hyperspectral image from a low-resolution observation.
We propose ESSAformer, an ESSA attention-embedded Transformer network for single-HSI-SR with an iterative refining structure.
arXiv Detail & Related papers (2023-07-26T07:45:14Z) - Reliable Joint Segmentation of Retinal Edema Lesions in OCT Images [55.83984261827332]
In this paper, we propose a novel reliable multi-scale wavelet-enhanced transformer network.
We develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module.
Our proposed method achieves better segmentation accuracy with a high degree of reliability as compared to other state-of-the-art segmentation approaches.
arXiv Detail & Related papers (2022-12-01T07:32:56Z) - Multi-fidelity Hierarchical Neural Processes [79.0284780825048]
Multi-fidelity surrogate modeling reduces the computational cost by fusing different simulation outputs.
We propose Multi-fidelity Hierarchical Neural Processes (MF-HNP), a unified neural latent variable model for multi-fidelity surrogate modeling.
We evaluate MF-HNP on epidemiology and climate modeling tasks, achieving competitive performance in terms of accuracy and uncertainty estimation.
arXiv Detail & Related papers (2022-06-10T04:54:13Z) - Factorizer: A Scalable Interpretable Approach to Context Modeling for
Medical Image Segmentation [6.030648996110607]
This work introduces a family of models, dubbed Factorizer, which leverages the power of low-rank matrix factorization for constructing an end-to-end segmentation model.
Specifically, we propose a linearly scalable approach to context modeling, formulating Nonnegative Matrix Factorization (NMF) as a differentiable layer integrated into a U-shaped architecture.
Factorizers compete favorably with CNNs and Transformers in terms of accuracy, scalability, and interpretability.
arXiv Detail & Related papers (2022-02-24T18:51:19Z) - Brain Image Synthesis with Unsupervised Multivariate Canonical
CSC$\ell_4$Net [122.8907826672382]
We propose to learn dedicated features that cross both intre- and intra-modal variations using a novel CSC$ell_4$Net.
arXiv Detail & Related papers (2021-03-22T05:19:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.