Spatial-aware Transformer-GRU Framework for Enhanced Glaucoma Diagnosis from 3D OCT Imaging
- URL: http://arxiv.org/abs/2403.05702v2
- Date: Thu, 04 Sep 2025 09:22:57 GMT
- Title: Spatial-aware Transformer-GRU Framework for Enhanced Glaucoma Diagnosis from 3D OCT Imaging
- Authors: Mona Ashtari-Majlan, David Masip,
- Abstract summary: We present a novel deep learning framework that leverages the diagnostic value of 3D Optical Coherence Tomography ( OCT) imaging for automated glaucoma detection.<n>We integrate a pre-trained Vision Transformer on retinal data for rich slice-wise feature extraction and a bidirectional Gated Recurrent Unit for capturing inter-slice spatial dependencies.
- Score: 3.093890460224435
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Glaucoma, a leading cause of irreversible blindness, necessitates early detection for accurate and timely intervention to prevent irreversible vision loss. In this study, we present a novel deep learning framework that leverages the diagnostic value of 3D Optical Coherence Tomography (OCT) imaging for automated glaucoma detection. In this framework, we integrate a pre-trained Vision Transformer on retinal data for rich slice-wise feature extraction and a bidirectional Gated Recurrent Unit for capturing inter-slice spatial dependencies. This dual-component approach enables comprehensive analysis of local nuances and global structural integrity, crucial for accurate glaucoma diagnosis. Experimental results on a large dataset demonstrate the superior performance of the proposed method over state-of-the-art ones, achieving an F1-score of 93.01%, Matthews Correlation Coefficient (MCC) of 69.33%, and AUC of 94.20%. The framework's ability to leverage the valuable information in 3D OCT data holds significant potential for enhancing clinical decision support systems and improving patient outcomes in glaucoma management.
Related papers
- A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z) - EYE-DEX: Eye Disease Detection and EXplanation System [0.45880283710344066]
Globally, over 2.2 billion people are affected by some form of vision impairment, resulting in annual productivity losses estimated at $411 billion.<n>In this study, we present EYE-DEX, an automated framework for classifying 10 retinal conditions.<n>We benchmark three pre-trained Convolutional Neural Network (CNN) models--VGG16, VGG19, and ResNet50--with our finetuned VGG16 achieving a state-of-the-art global benchmark test accuracy of 92.36%.
arXiv Detail & Related papers (2025-09-29T00:10:02Z) - DRetNet: A Novel Deep Learning Framework for Diabetic Retinopathy Diagnosis [8.234135343778993]
Current DR detection systems struggle with poor-quality images, lack interpretability, and insufficient integration of domain-specific knowledge.<n>We introduce a novel framework that integrates three innovative contributions.<n>The framework achieves an accuracy of 92.7%, a precision of 92.5%, a recall of 92.6%, an F1-score of 92.5%, an AUC of 97.8%, a mAP of 0.96, and an MCC of 0.85.
arXiv Detail & Related papers (2025-09-01T02:27:16Z) - A Novel Attention-Augmented Wavelet YOLO System for Real-time Brain Vessel Segmentation on Transcranial Color-coded Doppler [49.03919553747297]
We propose an AI-powered, real-time CoW auto-segmentation system capable of efficiently capturing cerebral arteries.<n>No prior studies have explored AI-driven cerebrovascular segmentation using Transcranial Color-coded Doppler (TCCD)<n>The proposed AAW-YOLO demonstrated strong performance in segmenting both ipsilateral and contralateral CoW vessels.
arXiv Detail & Related papers (2025-08-19T14:41:22Z) - Decentralized LoRA Augmented Transformer with Context-aware Multi-scale Feature Learning for Secured Eye Diagnosis [2.1358421658740214]
This paper proposes a novel Data efficient Image Transformer (DeiT) based framework that integrates context aware multiscale patch embedding, Low-Rank Adaptation (LoRA), knowledge distillation, and federated learning to address these challenges in a unified manner.<n>The proposed model effectively captures both local and global retinal features by leveraging multi scale patch representations with local and global attention mechanisms.
arXiv Detail & Related papers (2025-05-11T13:51:56Z) - Vision-Language Models for Acute Tuberculosis Diagnosis: A Multimodal Approach Combining Imaging and Clinical Data [0.0]
This study introduces a Vision-Language Model (VLM) leveraging SIGLIP and Gemma-3b architectures for automated acute tuberculosis (TB) screening.
The VLM combines visual data from chest X-rays with clinical context to generate detailed, context-aware diagnostic reports.
Key acute TB pathologies, including consolidation, cavities, and nodules, were detected with high precision and recall.
arXiv Detail & Related papers (2025-03-17T14:08:35Z) - Advancing Chronic Tuberculosis Diagnostics Using Vision-Language Models: A Multi modal Framework for Precision Analysis [0.0]
This study proposes a Vision-Language Model (VLM) to enhance automated chronic tuberculosis (TB) screening.
By integrating chest X-ray images with clinical data, the model addresses the challenges of manual interpretation.
The model demonstrated high precision (94 percent) and recall (94 percent) for detecting key chronic TB pathologies.
arXiv Detail & Related papers (2025-03-17T13:49:29Z) - Low-Resolution Chest X-ray Classification via Knowledge Distillation and Multi-task Learning [46.75992018094998]
This research addresses the challenges of diagnosing chest X-rays (CXRs) at low resolutions.
High-resolution CXR imaging is crucial for identifying small but critical anomalies, such as nodules or opacities.
This paper presents the Multilevel Collaborative Attention Knowledge (MLCAK) method.
arXiv Detail & Related papers (2024-05-22T06:10:54Z) - Super-resolution of biomedical volumes with 2D supervision [84.5255884646906]
Masked slice diffusion for super-resolution exploits the inherent equivalence in the data-generating distribution across all spatial dimensions of biological specimens.
We focus on the application of SliceR to stimulated histology (SRH), characterized by its rapid acquisition of high-resolution 2D images but slow and costly optical z-sectioning.
arXiv Detail & Related papers (2024-04-15T02:41:55Z) - Weakly supervised segmentation of intracranial aneurysms using a novel 3D focal modulation UNet [0.5106162890866905]
We propose FocalSegNet, a novel 3D focal modulation UNet, to detect an aneurysm and offer an initial, coarse segmentation of it from time-of-flight MRA image patches.
We trained and evaluated our model on a public dataset, and in terms of UIA detection, our model showed a low false-positive rate of 0.21 and a high sensitivity of 0.80.
arXiv Detail & Related papers (2023-08-06T03:28:08Z) - Automatic diagnosis of knee osteoarthritis severity using Swin
transformer [55.01037422579516]
Knee osteoarthritis (KOA) is a widespread condition that can cause chronic pain and stiffness in the knee joint.
We propose an automated approach that employs the Swin Transformer to predict the severity of KOA.
arXiv Detail & Related papers (2023-07-10T09:49:30Z) - nnUNet RASPP for Retinal OCT Fluid Detection, Segmentation and
Generalisation over Variations of Data Sources [25.095695898777656]
We propose two variants of the nnUNet with consistent high performance across images from multiple device vendors.
The algorithm was validated on the MICCAI 2017 RETOUCH challenge dataset.
Experimental results show that our algorithms outperform the current state-of-the-arts algorithms.
arXiv Detail & Related papers (2023-02-25T23:47:23Z) - Feature Representation Learning for Robust Retinal Disease Detection
from Optical Coherence Tomography Images [0.0]
Ophthalmic images may contain identical-looking pathologies that can cause failure in automated techniques to distinguish different retinal degenerative diseases.
In this work, we propose a robust disease detection architecture with three learning heads.
Our experimental results on two publicly available OCT datasets illustrate that the proposed model outperforms existing state-of-the-art models in terms of accuracy, interpretability, and robustness for out-of-distribution retinal disease detection.
arXiv Detail & Related papers (2022-06-24T07:59:36Z) - Geometric Deep Learning to Identify the Critical 3D Structural Features
of the Optic Nerve Head for Glaucoma Diagnosis [52.06403518904579]
The optic nerve head (ONH) undergoes complex and deep 3D morphological changes during the development and progression of glaucoma.
We used PointNet and dynamic graph convolutional neural network (DGCNN) to diagnose glaucoma from 3D ONH point clouds.
Our approach may have strong potential to be used in clinical applications for the diagnosis and prognosis of a wide range of ophthalmic disorders.
arXiv Detail & Related papers (2022-04-14T12:52:10Z) - Deep Learning based Framework for Automatic Diagnosis of Glaucoma based
on analysis of Focal Notching in the Optic Nerve Head [0.2580765958706854]
We propose a deep learning-based pipeline for automatic segmentation of optic disc (OD) and optic cup (OC) regions from Digital Fundus Images (DFIs)
This methodology has utilized focal notch analysis of neuroretinal rim along with cup-to-disc ratio values as classifying parameters to enhance the accuracy of Computer-aided design (CAD) systems in analyzing glaucoma.
The proposed pipeline was evaluated on the freely available DRISHTI-GS dataset with a resultant accuracy of 93.33% for detecting Glaucoma from DFIs.
arXiv Detail & Related papers (2021-12-10T18:58:40Z) - The Three-Dimensional Structural Configuration of the Central Retinal
Vessel Trunk and Branches as a Glaucoma Biomarker [41.97805846007449]
We trained a deep learning network to automatically segment the CRVT&B from the B-scans of the optical coherence tomography volume of the optic nerve head (ONH)
The 3D and 2D diagnostic networks were able to differentiate glaucoma from non-glaucoma subjects with accuracies of 82.7% and 83.3%, respectively.
arXiv Detail & Related papers (2021-11-07T04:41:49Z) - Assessing glaucoma in retinal fundus photographs using Deep Feature
Consistent Variational Autoencoders [63.391402501241195]
glaucoma is challenging to detect since it remains asymptomatic until the symptoms are severe.
Early identification of glaucoma is generally made based on functional, structural, and clinical assessments.
Deep learning methods have partially solved this dilemma by bypassing the marker identification stage and analyzing high-level information directly to classify the data.
arXiv Detail & Related papers (2021-10-04T16:06:49Z) - An Interpretable Multiple-Instance Approach for the Detection of
referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images.
By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy.
We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z) - Revisiting 3D Context Modeling with Supervised Pre-training for
Universal Lesion Detection in CT Slices [48.85784310158493]
We propose a Modified Pseudo-3D Feature Pyramid Network (MP3D FPN) to efficiently extract 3D context enhanced 2D features for universal lesion detection in CT slices.
With the novel pre-training method, the proposed MP3D FPN achieves state-of-the-art detection performance on the DeepLesion dataset.
The proposed 3D pre-trained weights can potentially be used to boost the performance of other 3D medical image analysis tasks.
arXiv Detail & Related papers (2020-12-16T07:11:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.