Related papers: MaxGlaViT: A novel lightweight vision transformer-based approach for early diagnosis of glaucoma stages from fundus images

MaxGlaViT: A novel lightweight vision transformer-based approach for early diagnosis of glaucoma stages from fundus images

URL: http://arxiv.org/abs/2502.17154v1
Date: Mon, 24 Feb 2025 13:48:04 GMT
Title: MaxGlaViT: A novel lightweight vision transformer-based approach for early diagnosis of glaucoma stages from fundus images
Authors: Mustafa Yurdakul, Kubra Uyar, Sakir Tasdemir,
Abstract summary: This study introduces MaxGlaViT, a lightweight model based on the restructured Multi-Axis Vision Transformer (MaxViT) for early glaucoma detection.<n>The model was evaluated using the HDV1 dataset, containing fundus images of different glaucoma stages.<n>MaxGlaViT outperforms experimental and state-of-the-art models, achieving 92.03% accuracy, 92.33% precision, 92.03% recall, 92.13% f1-score, and 87.12% Cohen's kappa score.
Score: 0.0
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Glaucoma is a prevalent eye disease that progresses silently without symptoms. If not detected and treated early, it can cause permanent vision loss. Computer-assisted diagnosis systems play a crucial role in timely and efficient identification. This study introduces MaxGlaViT, a lightweight model based on the restructured Multi-Axis Vision Transformer (MaxViT) for early glaucoma detection. First, MaxViT was scaled to optimize block and channel numbers, resulting in a lighter architecture. Second, the stem was enhanced by adding attention mechanisms (CBAM, ECA, SE) after convolution layers to improve feature learning. Third, MBConv structures in MaxViT blocks were replaced by advanced DL blocks (ConvNeXt, ConvNeXtV2, InceptionNeXt). The model was evaluated using the HDV1 dataset, containing fundus images of different glaucoma stages. Additionally, 40 CNN and 40 ViT models were tested on HDV1 to validate MaxGlaViT's efficiency. Among CNN models, EfficientB6 achieved the highest accuracy (84.91%), while among ViT models, MaxViT-Tiny performed best (86.42%). The scaled MaxViT reached 87.93% accuracy. Adding ECA to the stem block increased accuracy to 89.01%. Replacing MBConv with ConvNeXtV2 further improved it to 89.87%. Finally, integrating ECA in the stem and ConvNeXtV2 in MaxViT blocks resulted in 92.03% accuracy. Testing 80 DL models for glaucoma stage classification, this study presents a comprehensive and comparative analysis. MaxGlaViT outperforms experimental and state-of-the-art models, achieving 92.03% accuracy, 92.33% precision, 92.03% recall, 92.13% f1-score, and 87.12% Cohen's kappa score.

Related papers

Semiconductor Wafer Map Defect Classification with Tiny Vision Transformers [0.0]
We propose ViT-Tiny, a lightweight Vision Transformer (ViT) framework optimized for wafer defect classification.<n>ViT-Tiny outperforms its ViT-Base counterpart and state-of-the-art (SOTA) models, such as MSF-Trans and CNN-based architectures.<n>It achieves an F1-score of 98.4%, surpassing MSF-Trans by 2.94% in four-defect classification, improving recall by 2.86% in two-defect classification, and increasing precision by 3.13% in three-defect classification.
arXiv Detail & Related papers (2025-04-03T11:18:00Z)
Is an Ultra Large Natural Image-Based Foundation Model Superior to a Retina-Specific Model for Detecting Ocular and Systemic Diseases? [15.146396276161937]
RETFound and DINOv2 models were evaluated for ocular disease detection and systemic disease prediction tasks.<n> RETFound achieved superior performance over all DINOv2 models in predicting heart failure, infarction, and ischaemic stroke.
arXiv Detail & Related papers (2025-02-10T09:31:39Z)
Deep Learning Ensemble for Predicting Diabetic Macular Edema Onset Using Ultra-Wide Field Color Fundus Image [2.9945018168793025]
Diabetic macular edema (DME) is a severe complication of diabetes.<n>We propose an ensemble method to predict ci-DME onset within a year.
arXiv Detail & Related papers (2024-10-09T02:16:29Z)
Brain Tumor Classification on MRI in Light of Molecular Markers [61.77272414423481]
Co-deletion of the 1p/19q gene is associated with clinical outcomes in low-grade gliomas. This study aims to utilize a specially MRI-based convolutional neural network for brain cancer detection.
arXiv Detail & Related papers (2024-09-29T07:04:26Z)
Comparative Performance Analysis of Transformer-Based Pre-Trained Models for Detecting Keratoconus Disease [0.0]
This study compares eight pre-trained CNNs for diagnosing keratoconus, a degenerative eye disease. MobileNetV2 was the best accurate model in identifying keratoconus and normal cases with few misclassifications.
arXiv Detail & Related papers (2024-08-16T20:15:24Z)
Enhancing Diabetic Retinopathy Diagnosis: A Lightweight CNN Architecture for Efficient Exudate Detection in Retinal Fundus Images [0.0]
This paper introduces a novel, lightweight convolutional neural network architecture tailored for automated exudate detection. We have incorporated domain-specific data augmentations to enhance the model's generalizability. Our model achieves an impressive F1 score of 90%, demonstrating its efficacy in the early detection of diabetic retinopathy through fundus imaging.
arXiv Detail & Related papers (2024-08-13T10:13:33Z)
Multi-scale Spatio-temporal Transformer-based Imbalanced Longitudinal Learning for Glaucoma Forecasting from Irregular Time Series Images [45.894671834869975]
Glaucoma is one of the major eye diseases that leads to progressive optic nerve fiber damage and irreversible blindness. We introduce the Multi-scale Spatio-temporal Transformer Network (MST-former) based on the transformer architecture tailored for sequential image inputs. Our method shows excellent generalization capability on the Alzheimer's Disease Neuroimaging Initiative (ADNI) MRI dataset, with an accuracy of 90.3% for mild cognitive impairment and Alzheimer's disease prediction.
arXiv Detail & Related papers (2024-02-21T02:16:59Z)
Ophthalmic Biomarker Detection Using Ensembled Vision Transformers and Knowledge Distillation [3.1487473474617125]
We train two vision transformer-based models and ensemble them at inference time. We find MaxViT's use of convolution layers followed by strided attention to be better suited for local feature detection. EVA-02's use of normal attention mechanism and knowledge distillation is better for detecting global features.
arXiv Detail & Related papers (2023-10-21T13:27:07Z)
BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis [42.917164607812886]
Vision-and-language(V&L) models take image and text as input and learn to capture the associations between them. BERTHop is a transformer-based model based on PixelHop++ and VisualBERT, for better capturing the associations between the two modalities.
arXiv Detail & Related papers (2021-08-10T21:51:25Z)
Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images. Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z)
Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance. For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming. In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z)
Classification of COVID-19 in CT Scans using Multi-Source Transfer Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans. With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet. Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z)
A multicenter study on radiomic features from T$_2$-weighted images of a customized MR pelvic phantom setting the basis for robust radiomic models in clinics [47.187609203210705]
2D and 3D T$$-weighted images of a pelvic phantom were acquired on three scanners. repeatability and repositioning of radiomic features were assessed.
arXiv Detail & Related papers (2020-05-14T09:24:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.