HMSViT: A Hierarchical Masked Self-Supervised Vision Transformer for Corneal Nerve Segmentation and Diabetic Neuropathy Diagnosis
- URL: http://arxiv.org/abs/2506.19474v2
- Date: Mon, 30 Jun 2025 08:40:51 GMT
- Title: HMSViT: A Hierarchical Masked Self-Supervised Vision Transformer for Corneal Nerve Segmentation and Diabetic Neuropathy Diagnosis
- Authors: Xin Zhang, Liangxiu Han, Yue Shi, Yanlin Zheng, Uazman Alam, Maryam Ferdousi, Rayaz Malik,
- Abstract summary: Diabetic Peripheral Neuropathy (DPN) affects nearly half of diabetes patients, requiring early detection.<n>We propose HMSViT, a novel Hierarchical Masked Self-Supervised Vision Transformer (HMSViT)<n>HMSViT employs pooling-based hierarchical and dual attention mechanisms with absolute positional encoding, enabling efficient multi-scale feature extraction.<n> Experiments on clinical CCM datasets showed HMSViT achieves state-of-the-art performance, with 61.34% mIoU for nerve segmentation and 70.40% diagnostic accuracy.
- Score: 3.8141400767898603
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diabetic Peripheral Neuropathy (DPN) affects nearly half of diabetes patients, requiring early detection. Corneal Confocal Microscopy (CCM) enables non-invasive diagnosis, but automated methods suffer from inefficient feature extraction, reliance on handcrafted priors, and data limitations. We propose HMSViT, a novel Hierarchical Masked Self-Supervised Vision Transformer (HMSViT) designed for corneal nerve segmentation and DPN diagnosis. Unlike existing methods, HMSViT employs pooling-based hierarchical and dual attention mechanisms with absolute positional encoding, enabling efficient multi-scale feature extraction by capturing fine-grained local details in early layers and integrating global context in deeper layers, all at a lower computational cost. A block-masked self supervised learning framework is designed for the HMSViT that reduces reliance on labelled data, enhancing feature robustness, while a multi-scale decoder is used for segmentation and classification by fusing hierarchical features. Experiments on clinical CCM datasets showed HMSViT achieves state-of-the-art performance, with 61.34% mIoU for nerve segmentation and 70.40% diagnostic accuracy, outperforming leading hierarchical models like the Swin Transformer and HiViT by margins of up to 6.39% in segmentation accuracy while using fewer parameters. Detailed ablation studies further reveal that integrating block-masked SSL with hierarchical multi-scale feature extraction substantially enhances performance compared to conventional supervised training. Overall, these comprehensive experiments confirm that HMSViT delivers excellent, robust, and clinically viable results, demonstrating its potential for scalable deployment in real-world diagnostic applications.
Related papers
- Towards Accurate and Interpretable Neuroblastoma Diagnosis via Contrastive Multi-scale Pathological Image Analysis [16.268045905735818]
We propose CMSwinKAN, a contrastive-learning-based multi-scale feature fusion model tailored for pathological image classification.<n>By fusing multi-scale features and leveraging contrastive learning strategies, CMSwinKAN mimics clinicians' comprehensive approach.<n>Results demonstrate that CMSwinKAN performs better than existing state-of-the-art pathology-specific models pre-trained on large datasets.
arXiv Detail & Related papers (2025-04-18T15:39:46Z) - TUMLS: Trustful Fully Unsupervised Multi-Level Segmentation for Whole Slide Images of Histology [41.94295877935867]
We present a trustful fully unsupervised multi-level segmentation methodology (TUMLS) for whole slide images (WSIs)<n>TUMLS adopts an autoencoder (AE) as a feature extractor to identify the different tissue types within low-resolution training data.<n>This solution integrates seamlessly into clinicians, transforming the examination of a whole WSI into a review of concise, interpretable cross-level insights.
arXiv Detail & Related papers (2025-04-17T07:48:05Z) - Comprehensive Evaluation of OCT-based Automated Segmentation of Retinal Layer, Fluid and Hyper-Reflective Foci: Impact on Diabetic Retinopathy Severity Assessment [0.0]
Diabetic retinopathy (DR) is a leading cause of vision loss, requiring early and accurate assessment to prevent irreversible damage.<n>This study proposes an active-learning-based deep learning pipeline for automated segmentation of retinal layers.
arXiv Detail & Related papers (2025-03-03T07:23:56Z) - RURANET++: An Unsupervised Learning Method for Diabetic Macular Edema Based on SCSE Attention Mechanisms and Dynamic Multi-Projection Head Clustering [13.423253964156117]
RURANET++ is an unsupervised learning-based automated diagnostic system for Diabetic Macular Edema (DME)<n>During feature processing, a pre-trained GoogLeNet model extracts deep features from retinal images, followed by PCA-based dimensionality reduction to 50 dimensions for computational efficiency.<n> Experimental results demonstrate superior performance across multiple metrics, achieving maximum accuracy (0.8411), precision (0.8593), recall (0.8411), and F1-score, with exceptional clustering quality.
arXiv Detail & Related papers (2025-02-27T16:06:57Z) - A Cascaded Dilated Convolution Approach for Mpox Lesion Classification [0.0]
Mpox virus presents significant diagnostic challenges due to its visual similarity to other skin lesion diseases.<n>Deep learning-based approaches for skin lesion classification offer a promising alternative.<n>This study introduces the Cascaded Atrous Group Attention framework to address these challenges.
arXiv Detail & Related papers (2024-12-13T12:47:30Z) - Lesion-aware network for diabetic retinopathy diagnosis [28.228110579446227]
We propose a CNN-based diabetic retinopathy (DR) diagnosis network with attention mechanism involved, termed lesion-aware network.
The proposed LANet is constructed by embedding the LAM and FPM into the CNN decoders for DR-related information utilization.
Our method outperforms the mainstream methods with an area under curve of 0.967 in DR screening, and increases the overall average precision by 7.6%, 2.1%, and 1.2% in lesion segmentation on three datasets.
arXiv Detail & Related papers (2024-08-14T03:06:04Z) - Classification of Heart Sounds Using Multi-Branch Deep Convolutional Network and LSTM-CNN [2.7699831151653305]
This study develops and evaluates novel deep learning architectures that offer fast, accurate, and cost-effective methods for automatic diagnosis of cardiac diseases.<n>We propose two innovative methodologies: first, a Multi-Branch Deep Convolutional Neural Network (MBDCN) that emulates human auditory processing by utilizing diverse convolutional filter sizes and power spectrum input for enhanced feature extraction.<n>Second, a Long Short-Term Memory-Convolutional Neural (LSCN) model that integrates LSTM blocks with MBDCN to improve time-domain feature extraction.
arXiv Detail & Related papers (2024-07-15T13:02:54Z) - CathFlow: Self-Supervised Segmentation of Catheters in Interventional Ultrasound Using Optical Flow and Transformers [66.15847237150909]
We introduce a self-supervised deep learning architecture to segment catheters in longitudinal ultrasound images.
The network architecture builds upon AiAReSeg, a segmentation transformer built with the Attention in Attention mechanism.
We validated our model on a test dataset, consisting of unseen synthetic data and images collected from silicon aorta phantoms.
arXiv Detail & Related papers (2024-03-21T15:13:36Z) - nnUNet RASPP for Retinal OCT Fluid Detection, Segmentation and
Generalisation over Variations of Data Sources [25.095695898777656]
We propose two variants of the nnUNet with consistent high performance across images from multiple device vendors.
The algorithm was validated on the MICCAI 2017 RETOUCH challenge dataset.
Experimental results show that our algorithms outperform the current state-of-the-arts algorithms.
arXiv Detail & Related papers (2023-02-25T23:47:23Z) - Affinity Feature Strengthening for Accurate, Complete and Robust Vessel
Segmentation [48.638327652506284]
Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms.
We present a novel approach, the affinity feature strengthening network (AFN), which jointly models geometry and refines pixel-wise segmentation features using a contrast-insensitive, multiscale affinity approach.
arXiv Detail & Related papers (2022-11-12T05:39:17Z) - RetiFluidNet: A Self-Adaptive and Multi-Attention Deep Convolutional
Network for Retinal OCT Fluid Segmentation [3.57686754209902]
Quantification of retinal fluids is necessary for OCT-guided treatment management.
New convolutional neural architecture named RetiFluidNet is proposed for multi-class retinal fluid segmentation.
Model benefits from hierarchical representation learning of textural, contextual, and edge features.
arXiv Detail & Related papers (2022-09-26T07:18:00Z) - Multiple Time Series Fusion Based on LSTM An Application to CAP A Phase
Classification Using EEG [56.155331323304]
Deep learning based electroencephalogram channels' feature level fusion is carried out in this work.
Channel selection, fusion, and classification procedures were optimized by two optimization algorithms.
arXiv Detail & Related papers (2021-12-18T14:17:49Z) - Hepatic vessel segmentation based on 3Dswin-transformer with inductive
biased multi-head self-attention [46.46365941681487]
We propose a robust end-to-end vessel segmentation network called Indu BIased Multi-Head Attention Vessel Net.
We introduce the voxel-wise embedding rather than patch-wise embedding to locate precise liver vessel voxels.
On the other hand, we propose inductive biased multi-head self-attention which learns inductive biased relative positional embedding from absolute position embedding.
arXiv Detail & Related papers (2021-11-05T10:17:08Z) - Multi-Task Neural Networks with Spatial Activation for Retinal Vessel
Segmentation and Artery/Vein Classification [49.64863177155927]
We propose a multi-task deep neural network with spatial activation mechanism to segment full retinal vessel, artery and vein simultaneously.
The proposed network achieves pixel-wise accuracy of 95.70% for vessel segmentation, and A/V classification accuracy of 94.50%, which is the state-of-the-art performance for both tasks.
arXiv Detail & Related papers (2020-07-18T05:46:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.