ViT-2SPN: Vision Transformer-based Dual-Stream Self-Supervised Pretraining Networks for Retinal OCT Classification
- URL: http://arxiv.org/abs/2501.17260v1
- Date: Tue, 28 Jan 2025 19:41:38 GMT
- Title: ViT-2SPN: Vision Transformer-based Dual-Stream Self-Supervised Pretraining Networks for Retinal OCT Classification
- Authors: Mohammadreza Saraei, Igor Kozak, Eung-Joo Lee,
- Abstract summary: Vision Transformer-based Dual-Stream Self-Supervised Pretraining Network (ViT-2SPN) is a novel framework designed to enhance feature extraction and improve diagnostic accuracy.
ViT-2SPN employs a three-stage workflow: Supervised Pretraining, Self-Supervised Pretraining, and Supervised Fine-Tuning.
ViT-2SPN achieves a mean AUC of 0.93, accuracy of 0.77, precision of 0.81, recall of 0.75, and an F1 score of 0.76, outperforming existing SSP-based methods.
- Score: 0.10241134756773226
- License:
- Abstract: Optical Coherence Tomography (OCT) is a non-invasive imaging modality essential for diagnosing various eye diseases. Despite its clinical significance, developing OCT-based diagnostic tools faces challenges, such as limited public datasets, sparse annotations, and privacy concerns. Although deep learning has made progress in automating OCT analysis, these challenges remain unresolved. To address these limitations, we introduce the Vision Transformer-based Dual-Stream Self-Supervised Pretraining Network (ViT-2SPN), a novel framework designed to enhance feature extraction and improve diagnostic accuracy. ViT-2SPN employs a three-stage workflow: Supervised Pretraining, Self-Supervised Pretraining (SSP), and Supervised Fine-Tuning. The pretraining phase leverages the OCTMNIST dataset (97,477 unlabeled images across four disease classes) with data augmentation to create dual-augmented views. A Vision Transformer (ViT-Base) backbone extracts features, while a negative cosine similarity loss aligns feature representations. Pretraining is conducted over 50 epochs with a learning rate of 0.0001 and momentum of 0.999. Fine-tuning is performed on a stratified 5.129% subset of OCTMNIST using 10-fold cross-validation. ViT-2SPN achieves a mean AUC of 0.93, accuracy of 0.77, precision of 0.81, recall of 0.75, and an F1 score of 0.76, outperforming existing SSP-based methods.
Related papers
- Enhancing Transfer Learning for Medical Image Classification with SMOTE: A Comparative Study [0.0]
This paper explores and enhances the application of Transfer Learning (TL) for multilabel image classification in medical imaging.
Our results show that TL models excel in brain tumor classification, achieving near-optimal metrics.
We integrate the Synthetic Minority Oversampling computation Technique (SMOTE) with TL and traditional machine learning(ML) methods, which improves accuracy by 1.97%, recall (sensitivity) by 5.43%, and specificity by 0.72%.
arXiv Detail & Related papers (2024-12-28T18:15:07Z) - Multi-Class Abnormality Classification Task in Video Capsule Endoscopy [3.656114607436271]
This work addressed the challenge of multiclass anomaly classification in video capsule Endoscopy (VCE)
The purpose is to correctly classify diverse gastrointestinal disorders, which is critical for increasing diagnostic efficiency in clinical settings.
Our team capsule commandos achieved 7th place ranking with a test set[7] performance of Mean AUC: 0.7314 and balanced accuracy: 0.3235.
arXiv Detail & Related papers (2024-10-25T21:22:52Z) - Virtual imaging trials improved the transparency and reliability of AI systems in COVID-19 imaging [1.6040478776985583]
This study focuses on using convolutional neural networks (CNNs) for COVID-19 diagnosis using computed tomography (CT) and chest radiography (CXR)
We developed and tested multiple AI models, 3D ResNet-like and 2D EfficientNetv2 architectures, across diverse datasets.
Models trained on the most diverse datasets showed the highest external testing performance, with AUC values ranging from 0.73 to 0.76 for CT and 0.70 to 0.73 for CXR.
arXiv Detail & Related papers (2023-08-17T19:12:32Z) - Bi-ViT: Pushing the Limit of Vision Transformer Quantization [38.24456467950003]
Vision transformers (ViTs) quantization offers a promising prospect to facilitate deploying large pre-trained networks on resource-limited devices.
We introduce a learnable scaling factor to reactivate the vanished gradients and illustrate its effectiveness through theoretical and experimental analyses.
We then propose a ranking-aware distillation method to rectify the disordered ranking in a teacher-student framework.
arXiv Detail & Related papers (2023-05-21T05:24:43Z) - Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI [6.930082824262643]
We introduce a novel end-to-end Cross-Shaped windows (CSwin) transformer UNet model, CSwin UNet, to detect clinically significant prostate cancer (csPCa) in prostate bi-parametric MR imaging (bpMRI)
Using a large prostate bpMRI dataset with 1500 patients, we first pretrain CSwin transformer using multi-task self-supervised learning to improve data-efficiency and network generalizability.
Five-fold cross validation shows that self-supervised CSwin UNet achieves 0.888 AUC and 0.545 Average Precision (AP), significantly outperforming four comparable models (Swin U
arXiv Detail & Related papers (2023-04-30T04:40:32Z) - Tissue Classification During Needle Insertion Using Self-Supervised
Contrastive Learning and Optical Coherence Tomography [53.38589633687604]
We propose a deep neural network that classifies the tissues from the phase and intensity data of complex OCT signals acquired at the needle tip.
We show that with 10% of the training set, our proposed pretraining strategy helps the model achieve an F1 score of 0.84 whereas the model achieves an F1 score of 0.60 without it.
arXiv Detail & Related papers (2023-04-26T14:11:04Z) - Q-ViT: Accurate and Fully Quantized Low-bit Vision Transformer [56.87383229709899]
We develop an information rectification module (IRM) and a distribution guided distillation scheme for fully quantized vision transformers (Q-ViT)
Our method achieves a much better performance than the prior arts.
arXiv Detail & Related papers (2022-10-13T04:00:29Z) - Self-supervised contrastive learning of echocardiogram videos enables
label-efficient cardiac disease diagnosis [48.64462717254158]
We developed a self-supervised contrastive learning approach, EchoCLR, to catered to echocardiogram videos.
When fine-tuned on small portions of labeled data, EchoCLR pretraining significantly improved classification performance for left ventricular hypertrophy (LVH) and aortic stenosis (AS)
EchoCLR is unique in its ability to learn representations of medical videos and demonstrates that SSL can enable label-efficient disease classification from small, labeled datasets.
arXiv Detail & Related papers (2022-07-23T19:17:26Z) - Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images.
Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z) - Classification of COVID-19 in CT Scans using Multi-Source Transfer
Learning [91.3755431537592]
We propose the use of Multi-Source Transfer Learning to improve upon traditional Transfer Learning for the classification of COVID-19 from CT scans.
With our multi-source fine-tuning approach, our models outperformed baseline models fine-tuned with ImageNet.
Our best performing model was able to achieve an accuracy of 0.893 and a Recall score of 0.897, outperforming its baseline Recall score by 9.3%.
arXiv Detail & Related papers (2020-09-22T11:53:06Z) - CovidDeep: SARS-CoV-2/COVID-19 Test Based on Wearable Medical Sensors
and Efficient Neural Networks [51.589769497681175]
The novel coronavirus (SARS-CoV-2) has led to a pandemic.
The current testing regime based on Reverse Transcription-Polymerase Chain Reaction for SARS-CoV-2 has been unable to keep up with testing demands.
We propose a framework called CovidDeep that combines efficient DNNs with commercially available WMSs for pervasive testing of the virus.
arXiv Detail & Related papers (2020-07-20T21:47:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.