XDR-LVLM: An Explainable Vision-Language Large Model for Diabetic Retinopathy Diagnosis
- URL: http://arxiv.org/abs/2508.15168v1
- Date: Thu, 21 Aug 2025 02:14:46 GMT
- Title: XDR-LVLM: An Explainable Vision-Language Large Model for Diabetic Retinopathy Diagnosis
- Authors: Masato Ito, Kaito Tanaka, Keisuke Matsuda, Aya Nakayama,
- Abstract summary: We propose XDR-LVLM (eXplainable Diabetic Retinopathy Diagnosis with LVLM), a novel framework that leverages Vision-Language Large Models (LVLMs) for high-precision DR diagnosis.<n>XDR-LVLM integrates a specialized Medical Vision, an LVLM Core, and employs Multi-task Prompt Engineering and Multi-stage Fine-tuning.<n>It achieves state-of-the-art performance, with a Balanced Accuracy of 84.55% and an F1 Score of 79.92% for disease diagnosis, and superior results for concept detection.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diabetic Retinopathy (DR) is a major cause of global blindness, necessitating early and accurate diagnosis. While deep learning models have shown promise in DR detection, their black-box nature often hinders clinical adoption due to a lack of transparency and interpretability. To address this, we propose XDR-LVLM (eXplainable Diabetic Retinopathy Diagnosis with LVLM), a novel framework that leverages Vision-Language Large Models (LVLMs) for high-precision DR diagnosis coupled with natural language-based explanations. XDR-LVLM integrates a specialized Medical Vision Encoder, an LVLM Core, and employs Multi-task Prompt Engineering and Multi-stage Fine-tuning to deeply understand pathological features within fundus images and generate comprehensive diagnostic reports. These reports explicitly include DR severity grading, identification of key pathological concepts (e.g., hemorrhages, exudates, microaneurysms), and detailed explanations linking observed features to the diagnosis. Extensive experiments on the Diabetic Retinopathy (DDR) dataset demonstrate that XDR-LVLM achieves state-of-the-art performance, with a Balanced Accuracy of 84.55% and an F1 Score of 79.92% for disease diagnosis, and superior results for concept detection (77.95% BACC, 66.88% F1). Furthermore, human evaluations confirm the high fluency, accuracy, and clinical utility of the generated explanations, showcasing XDR-LVLM's ability to bridge the gap between automated diagnosis and clinical needs by providing robust and interpretable insights.
Related papers
- An Explainable Hybrid AI Framework for Enhanced Tuberculosis and Symptom Detection [55.35661671061754]
Tuberculosis remains a critical global health issue, particularly in resource-limited and remote areas.<n>We propose a framework which enhances disease and symptom detection on chest X-rays by integrating two supervised heads and a self-supervised head.<n>Our model achieves an accuracy of 98.85% for distinguishing between COVID-19, tuberculosis, and normal cases, and a macro-F1 score of 90.09% for multilabel symptom detection.
arXiv Detail & Related papers (2025-10-21T17:18:55Z) - Hybrid Deep Learning Framework for Enhanced Diabetic Retinopathy Detection: Integrating Traditional Features with AI-driven Insights [0.0]
Diabetic Retinopathy (DR), a vision-threatening complication of Dia-betes Mellitus (DM), is a major global concern, particularly in India.<n>Fundus imaging aids precise diagnosis by detecting subtle retinal lesions.<n>This paper introduces a hybrid diagnostic framework combining traditional feature extraction and deep learning (DL) to enhance DR detection.
arXiv Detail & Related papers (2025-10-21T09:50:16Z) - RAD: Towards Trustworthy Retrieval-Augmented Multi-modal Clinical Diagnosis [56.373297358647655]
Retrieval-Augmented Diagnosis (RAD) is a novel framework that injects external knowledge into multimodal models directly on downstream tasks.<n>RAD operates through three key mechanisms: retrieval and refinement of disease-centered knowledge from multiple medical sources, a guideline-enhanced contrastive loss transformer, and a dual decoder.
arXiv Detail & Related papers (2025-09-24T10:36:14Z) - A Disease-Centric Vision-Language Foundation Model for Precision Oncology in Kidney Cancer [54.58205672910646]
RenalCLIP is a visual-language foundation model for characterization, diagnosis and prognosis of renal mass.<n>It achieved better performance and superior generalizability across 10 core tasks spanning the full clinical workflow of kidney cancer.
arXiv Detail & Related papers (2025-08-22T17:48:19Z) - Design and Validation of a Responsible Artificial Intelligence-based System for the Referral of Diabetic Retinopathy Patients [65.57160385098935]
Early detection of Diabetic Retinopathy can reduce the risk of vision loss by up to 95%.<n>We developed RAIS-DR, a Responsible AI System for DR screening that incorporates ethical principles across the AI lifecycle.<n>We evaluated RAIS-DR against the FDA-approved EyeArt system on a local dataset of 1,046 patients, unseen by both systems.
arXiv Detail & Related papers (2025-08-17T21:54:11Z) - X-Ray-CoT: Interpretable Chest X-ray Diagnosis with Vision-Language Models via Chain-of-Thought Reasoning [0.0]
We propose X-Ray-CoT (Chest X-Ray Chain-of-Thought), a novel framework for intelligent chest X-ray diagnosis and interpretable report generation.<n>X-Ray-CoT simulates human radiologists' "chain-of-thought" by first extracting multi-modal features and visual concepts.<n>It achieves competitive quantitative performance, with a Balanced Accuracy of 80.52% and F1 score of 78.65% for disease diagnosis.
arXiv Detail & Related papers (2025-08-17T18:00:41Z) - VL-MedGuide: A Visual-Linguistic Large Model for Intelligent and Explainable Skin Disease Auxiliary Diagnosis [3.7978950713339215]
This study introduces VL-MedGuide, a novel framework leveraging the powerful multi-modal understanding and reasoning capabilities of Visual-Language Large Models (LVLMs)<n>Experiments on the Derm7pt dataset demonstrate that VL-MedGuide achieves state-of-the-art performance in both disease diagnosis and concept detection.<n>Human evaluations confirm the high clarity, completeness, and trustworthiness of its generated explanations.
arXiv Detail & Related papers (2025-08-08T18:13:34Z) - RadFabric: Agentic AI System with Reasoning Capability for Radiology [61.25593938175618]
RadFabric is a multi agent, multimodal reasoning framework that unifies visual and textual analysis for comprehensive CXR interpretation.<n>System employs specialized CXR agents for pathology detection, an Anatomical Interpretation Agent to map visual findings to precise anatomical structures, and a Reasoning Agent powered by large multimodal reasoning models to synthesize visual, anatomical, and clinical data into transparent and evidence based diagnoses.
arXiv Detail & Related papers (2025-06-17T03:10:33Z) - Vision-Language Models for Acute Tuberculosis Diagnosis: A Multimodal Approach Combining Imaging and Clinical Data [0.0]
This study introduces a Vision-Language Model (VLM) leveraging SIGLIP and Gemma-3b architectures for automated acute tuberculosis (TB) screening.<n>The VLM combines visual data from chest X-rays with clinical context to generate detailed, context-aware diagnostic reports.<n>Key acute TB pathologies, including consolidation, cavities, and nodules, were detected with high precision and recall.
arXiv Detail & Related papers (2025-03-17T14:08:35Z) - Efficient and Comprehensive Feature Extraction in Large Vision-Language Model for Pathology Analysis [37.11302829771659]
Large vision-language models (LVLMs) are limited by input resolution constraints, hindering their efficiency and accuracy in pathology image analysis.<n>We propose two innovative strategies: the mixed task-guided feature enhancement, and the prompt-guided detail feature completion.<n>We trained the pathology-specialized LVLM, OmniPath, which significantly outperforms existing methods in diagnostic accuracy and efficiency.
arXiv Detail & Related papers (2024-12-12T18:07:23Z) - Convolutional Neural Network Model for Diabetic Retinopathy Feature
Extraction and Classification [6.236743421605786]
We create a novel CNN model and identifies the severity of Diabetic Retinopathy through fundus image input.
We classified 4 known DR features, including micro-aneurysms, cotton wools, exudates, and hemorrhages, through convolutional layers.
Our contribution is an interpretable model with similar accuracy to more complex models.
arXiv Detail & Related papers (2023-10-16T20:09:49Z) - Interpretable Vertebral Fracture Diagnosis [69.68641439851777]
Black-box neural network models learn clinically relevant features for fracture diagnosis.
This work identifies the concepts networks use for vertebral fracture diagnosis in CT images.
arXiv Detail & Related papers (2022-03-30T13:07:41Z) - An Interpretable Multiple-Instance Approach for the Detection of
referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images.
By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy.
We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z) - A Benchmark for Studying Diabetic Retinopathy: Segmentation, Grading,
and Transferability [76.64661091980531]
People with diabetes are at risk of developing diabetic retinopathy (DR)
Computer-aided DR diagnosis is a promising tool for early detection of DR and severity grading.
This dataset has 1,842 images with pixel-level DR-related lesion annotations, and 1,000 images with image-level labels graded by six board-certified ophthalmologists.
arXiv Detail & Related papers (2020-08-22T07:48:04Z) - Diagnosis of Coronavirus Disease 2019 (COVID-19) with Structured Latent
Multi-View Representation Learning [48.05232274463484]
Recently, the outbreak of Coronavirus Disease 2019 (COVID-19) has spread rapidly across the world.
Due to the large number of affected patients and heavy labor for doctors, computer-aided diagnosis with machine learning algorithm is urgently needed.
In this study, we propose to conduct the diagnosis of COVID-19 with a series of features extracted from CT images.
arXiv Detail & Related papers (2020-05-06T15:19:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.