Related papers: Glaucoma Detection and Structured OCT Report Generation via a Fine-tuned Multimodal Large Language Model

Glaucoma Detection and Structured OCT Report Generation via a Fine-tuned Multimodal Large Language Model

URL: http://arxiv.org/abs/2510.02403v1
Date: Wed, 01 Oct 2025 22:37:28 GMT
Title: Glaucoma Detection and Structured OCT Report Generation via a Fine-tuned Multimodal Large Language Model
Authors: Jalil Jalili, Yashraj Gavhane, Evan Walker, Anna Heinke, Christopher Bowd, Akram Belghith, Massimo A. Fazio, Christopher A. Girkin, C. Gustavo De Moraes, Jeffrey M. Liebmann, Sally L. Baxter, Robert N. Weinreb, Linda M. Zangwill, Mark Christopher,
Abstract summary: The model was evaluated on a held-out test set for three tasks: quality assessment, glaucoma detection, and RNFL thinning classification.<n>The model achieved high accuracy in identifying image quality issues and detecting glaucoma.
Score: 1.0925680160683622
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Objective: To develop an explainable multimodal large language model (MM-LLM) that (1) screens optic nerve head (ONH) OCT circle scans for quality and (2) generates structured clinical reports that include glaucoma diagnosis and sector-wise retinal nerve fiber layer (RNFL) thinning assessments. Design: Retrospective cohort study of 1,310 subjects contributing 43,849 Spectralis ONH OCT circle scans (1,331 glaucomatous and 867 healthy eyes) from the DIGS and ADAGES cohorts. Methods: A MM-LLM (Llama 3.2 Vision-Instruct model) was fine-tuned to generate clinical descriptions of OCT imaging data. Training data included paired OCT images and automatically generated, structured clinical reports that described global and sectoral RNFL thinning. Poor-quality scans were labeled as unusable and paired with a fixed refusal statement. The model was evaluated on a held-out test set for three tasks: quality assessment, glaucoma detection, and RNFL thinning classification across seven anatomical sectors. Evaluation metrics included accuracy, sensitivity, specificity, precision, and F1-score. Model description quality was also evaluated using standard text evaluation metrics. Results: The model achieved 0.90 accuracy and 0.98 specificity for quality triage. For glaucoma detection, accuracy was 0.86 (sensitivity 0.91, specificity 0.73, F1-score 0.91). RNFL thinning prediction accuracy ranged from 0.83 to 0.94, with highest performance in global and temporal sectors. Text generation scores showed strong alignment with reference reports (BLEU: 0.82; ROUGE-1: 0.94; ROUGE-2: 0.87; ROUGE-L: 0.92; BERTScore-F1: 0.99). Conclusions: The fine-tuned MM-LLM generated accurate clinical descriptions based on OCT imaging. The model achieved high accuracy in identifying image quality issues and detecting glaucoma. The model also provided sectoral descriptions of RNFL thinning to help support clinical OCT evaluation.

Related papers

Robust Multicentre Detection and Classification of Colorectal Liver Metastases on CT: Application of Foundation Models [11.274035647041762]
We developed a foundation model-based AI pipeline for patient-level classification and lesion-level detection of CRLM on CT.<n> UMedPT achieved the best performance and was fine-tuned with a head for classification and an FCOS-based head for lesion detection.
arXiv Detail & Related papers (2026-01-12T14:35:29Z)
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
A Novel Attention-Augmented Wavelet YOLO System for Real-time Brain Vessel Segmentation on Transcranial Color-coded Doppler [49.03919553747297]
We propose an AI-powered, real-time CoW auto-segmentation system capable of efficiently capturing cerebral arteries.<n>No prior studies have explored AI-driven cerebrovascular segmentation using Transcranial Color-coded Doppler (TCCD)<n>The proposed AAW-YOLO demonstrated strong performance in segmenting both ipsilateral and contralateral CoW vessels.
arXiv Detail & Related papers (2025-08-19T14:41:22Z)
A Clinically-Grounded Two-Stage Framework for Renal CT Report Generation [4.408787333571913]
We propose a framework for automatic renal CT report generation.<n>In Stage 1, a multi-task learning model detects structured clinical features from each 2D image.<n>In Stage 2, a vision-language model generates free-text reports conditioned on the image and the detected features.
arXiv Detail & Related papers (2025-06-30T07:45:02Z)
Explainable Anatomy-Guided AI for Prostate MRI: Foundation Models and In Silico Clinical Trials for Virtual Biopsy-based Risk Assessment [3.5408411348831232]
We present a fully automated, anatomically guided deep learning pipeline for prostate cancer (PCa) risk stratification using routine MRI.<n>The pipeline integrates three key components: an nnU-Net module for segmenting the prostate gland and its zones on axial T2-weighted MRI; a classification module based on the DiceedPT Swin Transformer foundation model, fine-tuned on 3D patches with optional anatomical priors and clinical data; and a VAE-GAN framework for generating counterfactual heatmaps that localize decision-driving image regions.
arXiv Detail & Related papers (2025-05-23T14:40:09Z)
Metrics that matter: Evaluating image quality metrics for medical image generation [48.85783422900129]
This study comprehensively assesses commonly used no-reference image quality metrics using brain MRI data.<n>We evaluate metric sensitivity to a range of challenges, including noise, distribution shifts, and, critically, morphological alterations designed to mimic clinically relevant inaccuracies.
arXiv Detail & Related papers (2025-05-12T01:57:25Z)
ThyroidEffi 1.0: A Cost-Effective System for High-Performance Multi-Class Thyroid Carcinoma Classification [0.0]
We develop and validate a deep learning system for multi-class thyroid FNAB image classification.<n>Benign, Indeterminate/Suspicious, and Malignant are three key categories directly guiding post-biopsy treatment.<n>The system processed 1000 cases in 30 seconds, demonstrating feasibility on widely accessible hardware.
arXiv Detail & Related papers (2025-04-19T02:13:07Z)
Is an Ultra Large Natural Image-Based Foundation Model Superior to a Retina-Specific Model for Detecting Ocular and Systemic Diseases? [19.8132297355024]
RETFound and DINOv2 models were evaluated for ocular disease detection and systemic disease prediction tasks.<n> RETFound achieved superior performance over all DINOv2 models in predicting heart failure, infarction, and ischaemic stroke.
arXiv Detail & Related papers (2025-02-10T09:31:39Z)
Improving Disease Classification Performance and Explainability of Deep Learning Models in Radiology with Heatmap Generators [0.0]
Three experiment sets were conducted with a U-Net architecture to improve the classification performance. The greatest improvements were for the "pneumonia" and "CHF" classes, which the baseline model struggled most to classify.
arXiv Detail & Related papers (2022-06-28T13:03:50Z)
3D Structural Analysis of the Optic Nerve Head to Robustly Discriminate Between Papilledema and Optic Disc Drusen [44.754910718620295]
We developed a deep learning algorithm to identify major tissue structures of the optic nerve head (ONH) in 3D optical coherence tomography ( OCT) scans. A classification algorithm was designed using 150 OCT volumes to perform 3-class classifications (1: ODD, 2: papilledema, 3: healthy) strictly from their drusen and prelamina swelling scores. Our AI approach accurately discriminated ODD from papilledema, using a single OCT scan.
arXiv Detail & Related papers (2021-12-18T17:05:53Z)
The Report on China-Spain Joint Clinical Testing for Rapid COVID-19 Risk Screening by Eye-region Manifestations [59.48245489413308]
We developed and tested a COVID-19 rapid prescreening model using the eye-region images captured in China and Spain with cellphone cameras. The performance was measured using area under receiver-operating-characteristic curve (AUC), sensitivity, specificity, accuracy, and F1.
arXiv Detail & Related papers (2021-09-18T02:28:01Z)
Systematic Clinical Evaluation of A Deep Learning Method for Medical Image Segmentation: Radiosurgery Application [48.89674088331313]
We systematically evaluate a Deep Learning (DL) method in a 3D medical image segmentation task. Our method is integrated into the radiosurgery treatment process and directly impacts the clinical workflow.
arXiv Detail & Related papers (2021-08-21T16:15:40Z)
Vision Transformers for femur fracture classification [59.99241204074268]
The Vision Transformer (ViT) was able to correctly predict 83% of the test images. Good results were obtained in sub-fractures with the largest and richest dataset ever.
arXiv Detail & Related papers (2021-08-07T10:12:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.