Related papers: TWLR: Text-Guided Weakly-Supervised Lesion Localization and Severity Regression for Explainable Diabetic Retinopathy Grading

TWLR: Text-Guided Weakly-Supervised Lesion Localization and Severity Regression for Explainable Diabetic Retinopathy Grading

URL: http://arxiv.org/abs/2512.13008v1
Date: Mon, 15 Dec 2025 06:08:16 GMT
Title: TWLR: Text-Guided Weakly-Supervised Lesion Localization and Severity Regression for Explainable Diabetic Retinopathy Grading
Authors: Xi Luo, Shixin Xu, Ying Xie, JianZhong Hu, Yuwei He, Yuhui Deng, Huaxiong Huang,
Abstract summary: We propose TWLR, a two-stage framework for interpretable diabetic retinopathy (DR) assessment.<n>In the first stage, a vision-supervised model integrates domain-specific ophthalmological knowledge into text embeddings to jointly perform DR grading and lesion classification.<n>The second stage introduces an iterative severity regression framework based on weakly-language semantic segmentation.
Score: 9.839282449612513
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Accurate medical image analysis can greatly assist clinical diagnosis, but its effectiveness relies on high-quality expert annotations Obtaining pixel-level labels for medical images, particularly fundus images, remains costly and time-consuming. Meanwhile, despite the success of deep learning in medical imaging, the lack of interpretability limits its clinical adoption. To address these challenges, we propose TWLR, a two-stage framework for interpretable diabetic retinopathy (DR) assessment. In the first stage, a vision-language model integrates domain-specific ophthalmological knowledge into text embeddings to jointly perform DR grading and lesion classification, effectively linking semantic medical concepts with visual features. The second stage introduces an iterative severity regression framework based on weakly-supervised semantic segmentation. Lesion saliency maps generated through iterative refinement direct a progressive inpainting mechanism that systematically eliminates pathological features, effectively downgrading disease severity toward healthier fundus appearances. Critically, this severity regression approach achieves dual benefits: accurate lesion localization without pixel-level supervision and providing an interpretable visualization of disease-to-healthy transformations. Experimental results on the FGADR, DDR, and a private dataset demonstrate that TWLR achieves competitive performance in both DR classification and lesion segmentation, offering a more explainable and annotation-efficient solution for automated retinal image analysis.

Related papers

Beyond CLIP: Knowledge-Enhanced Multimodal Transformers for Cross-Modal Alignment in Diabetic Retinopathy Diagnosis [7.945705180020063]
We propose a knowledge-enhanced joint embedding framework that integrates retinal fundus images, clinical text, and structured patient data.<n>Our framework achieves near-perfect text-to-image retrieval performance with Recall@1 of 99.94% compared to fine-tuned CLIP's 1.29%.
arXiv Detail & Related papers (2025-12-22T18:41:45Z)
Quadrant Segmentation VLM with Few-Shot Adaptation and OCT Learning-based Explainability Methods for Diabetic Retinopathy [0.0]
Diabetic Retinopathy (DR) is a leading cause of vision loss worldwide, requiring early detection to preserve sight.<n>Current AI models utilize lesion segmentation for interpretability; however, manually annotating lesions is impractical for clinicians.<n>This paper presents a novel multimodal explainability model utilizing a VLM with few-shot learning, which mimics an ophthalmologist's reasoning by analyzing lesion distributions within retinal quadrants for fundus images.
arXiv Detail & Related papers (2025-12-20T17:45:33Z)
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
MIRNet: Integrating Constrained Graph-Based Reasoning with Pre-training for Diagnostic Medical Imaging [67.74482877175797]
MIRNet is a novel framework that integrates self-supervised pre-training with constrained graph-based reasoning.<n>We introduce TongueAtlas-4K, a benchmark comprising 4,000 images annotated with 22 diagnostic labels.
arXiv Detail & Related papers (2025-11-13T06:30:41Z)
Self-Supervised Anatomical Consistency Learning for Vision-Grounded Medical Report Generation [61.350584471060756]
Vision-grounded medical report generation aims to produce clinically accurate descriptions of medical images.<n>We propose Self-Supervised Anatomical Consistency Learning (SS-ACL) to align generated reports with corresponding anatomical regions.<n>SS-ACL constructs a hierarchical anatomical graph inspired by the invariant top-down inclusion structure of human anatomy.
arXiv Detail & Related papers (2025-09-30T08:59:06Z)
Interpretable Few-Shot Retinal Disease Diagnosis with Concept-Guided Prompting of Vision-Language Models [11.076403908252754]
We implement two key strategies to extract interpretable concepts of retinal diseases from fundus images.<n>Our method improves retinal disease classification and enriches few-shot and zero-shot detection.<n>Our method marks a pivotal step towards interpretable and efficient retinal disease recognition for real-world clinical applications.
arXiv Detail & Related papers (2025-03-04T12:03:42Z)
Efficient Few-Shot Medical Image Analysis via Hierarchical Contrastive Vision-Language Learning [44.99833362998488]
We propose Adaptive Vision-Language Fine-tuning with Hierarchical Contrastive Alignment (HiCA) for medical image analysis.<n>HiCA combines domain-specific pretraining and hierarchical contrastive learning to align visual and textual representations at multiple levels.<n>We evaluate our approach on two benchmark datasets, Chest X-ray and Breast Ultrasound.
arXiv Detail & Related papers (2025-01-16T05:01:30Z)
XVertNet: Unsupervised Contrast Enhancement of Vertebral Structures with Dynamic Self-Tuning Guidance and Multi-Stage Analysis [2.1987431057890467]
Chest X-rays remain the primary diagnostic tool in emergency medicine, yet their limited ability to capture fine anatomical details can result in missed or delayed diagnoses.<n>We introduce XVertNet, a novel deep-learning framework designed to enhance vertebral structure visualization in X-ray images significantly.
arXiv Detail & Related papers (2023-06-06T19:36:11Z)
OTRE: Where Optimal Transport Guided Unpaired Image-to-Image Translation Meets Regularization by Enhancing [4.951748109810726]
Optimal retinal image quality is mandated for accurate medical diagnoses and automated analyses. We propose an unpaired image-to-image translation scheme for mapping low-quality retinal CFPs to high-quality counterparts. We validated the integrated framework, OTRE, on three publicly available retinal image datasets.
arXiv Detail & Related papers (2023-02-06T18:39:40Z)
An Interpretable Multiple-Instance Approach for the Detection of referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images. By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy. We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z)
Towards Unbiased COVID-19 Lesion Localisation and Segmentation via Weakly Supervised Learning [66.36706284671291]
We propose a data-driven framework supervised by only image-level labels to support unbiased lesion localisation. The framework can explicitly separate potential lesions from original images, with the help of a generative adversarial network and a lesion-specific decoder.
arXiv Detail & Related papers (2021-03-01T06:05:49Z)
A Benchmark for Studying Diabetic Retinopathy: Segmentation, Grading, and Transferability [76.64661091980531]
People with diabetes are at risk of developing diabetic retinopathy (DR) Computer-aided DR diagnosis is a promising tool for early detection of DR and severity grading. This dataset has 1,842 images with pixel-level DR-related lesion annotations, and 1,000 images with image-level labels graded by six board-certified ophthalmologists.
arXiv Detail & Related papers (2020-08-22T07:48:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.