Related papers: RetinaLogos: Fine-Grained Synthesis of High-Resolution Retinal Images Through Captions

RetinaLogos: Fine-Grained Synthesis of High-Resolution Retinal Images Through Captions

URL: http://arxiv.org/abs/2505.12887v1
Date: Mon, 19 May 2025 09:18:11 GMT
Title: RetinaLogos: Fine-Grained Synthesis of High-Resolution Retinal Images Through Captions
Authors: Junzhi Ning, Cheng Tang, Kaijin Zhou, Diping Song, Lihao Liu, Ming Hu, Wei Li, Yanzhou Su, Tianbing Li, Jiyao Liu, Yejin, Sheng Zhang, Yuanfeng Ji, Junjun He,
Abstract summary: RetinaLogos-1400k is a large-scale, synthetic Caption-CFP dataset comprising 1.4 million entries.<n>We employ a novel three-step training framework, called RetinaLogos, which enables fine-grained semantic control over retinal images.<n>Experiments demonstrate state-of-the-art performance across multiple datasets, with 62.07% of text-driven synthetic images indistinguishable from real ones by ophthalmologists.
Score: 15.499798559622528
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The scarcity of high-quality, labelled retinal imaging data, which presents a significant challenge in the development of machine learning models for ophthalmology, hinders progress in the field. To synthesise Colour Fundus Photographs (CFPs), existing methods primarily relying on predefined disease labels face significant limitations. However, current methods remain limited, thus failing to generate images for broader categories with diverse and fine-grained anatomical structures. To overcome these challenges, we first introduce an innovative pipeline that creates a large-scale, synthetic Caption-CFP dataset comprising 1.4 million entries, called RetinaLogos-1400k. Specifically, RetinaLogos-1400k uses large language models (LLMs) to describe retinal conditions and key structures, such as optic disc configuration, vascular distribution, nerve fibre layers, and pathological features. Furthermore, based on this dataset, we employ a novel three-step training framework, called RetinaLogos, which enables fine-grained semantic control over retinal images and accurately captures different stages of disease progression, subtle anatomical variations, and specific lesion types. Extensive experiments demonstrate state-of-the-art performance across multiple datasets, with 62.07% of text-driven synthetic images indistinguishable from real ones by ophthalmologists. Moreover, the synthetic data improves accuracy by 10%-25% in diabetic retinopathy grading and glaucoma detection, thereby providing a scalable solution to augment ophthalmic datasets.

Related papers

Quasi-multimodal-based pathophysiological feature learning for retinal disease diagnosis [4.437523386839875]
A unified framework that integrates multimodal data synthesis and fusion is proposed for retinal disease classification and grading.<n>The proposed learning system is thoroughly interpreted through visualizations in both image and feature spaces.<n>This work not only enhances the accuracy and efficiency of retinal disease screening but also offers a scalable framework for data augmentation across various medical imaging modalities.
arXiv Detail & Related papers (2026-02-03T15:13:57Z)
A Semantically Enhanced Generative Foundation Model Improves Pathological Image Synthesis [82.01597026329158]
We introduce a Correlation-Regulated Alignment Framework for Tissue Synthesis (CRAFTS) for pathology-specific text-to-image synthesis.<n>CRAFTS incorporates a novel alignment mechanism that suppresses semantic drift to ensure biological accuracy.<n>This model generates diverse pathological images spanning 30 cancer types, with quality rigorously validated by objective metrics and pathologist evaluations.
arXiv Detail & Related papers (2025-12-15T10:22:43Z)
Synthetic Vasculature and Pathology Enhance Vision-Language Model Reasoning [39.96133625333846]
We introduce Synthetic Vasculature Reasoning (SVR), a framework that controllably synthesizes images and corresponding text.<n>Based on this we curate OCTA-100K-SVR, an OCTA image-reasoning dataset with 100,000 pairs.<n>Our experiments show that a general-purpose VLM trained on the dataset achieves a zero-shot balanced classification accuracy of 89.67% on real OCTA images.
arXiv Detail & Related papers (2025-12-11T19:19:39Z)
An Explainable Transformer Model for Alzheimer's Disease Detection Using Retinal Imaging [5.3785187022022845]
Alzheimer's disease (AD) is a neurodegenerative disorder that affects millions worldwide.<n>In this study, we propose Retformer, a novel transformer-based architecture for detecting AD using retinal imaging modalities.
arXiv Detail & Related papers (2025-07-06T06:40:42Z)
PixCell: A generative foundation model for digital histopathology images [49.00921097924924]
We introduce PixCell, the first diffusion-based generative foundation model for histopathology.<n>We train PixCell on PanCan-30M, a vast, diverse dataset derived from 69,184 H&E-stained whole slide images covering various cancer types.
arXiv Detail & Related papers (2025-06-05T15:14:32Z)
Diverse Image Generation with Diffusion Models and Cross Class Label Learning for Polyp Classification [4.747649393635696]
We develop a novel model, PathoPolyp-Diff, that generates text-controlled synthetic images with diverse characteristics.<n>We introduce cross-class label learning to make the model learn features from other classes, reducing the burdensome task of data annotation.
arXiv Detail & Related papers (2025-02-08T04:26:20Z)
Rethinking Diffusion-Based Image Generators for Fundus Fluorescein Angiography Synthesis on Limited Data [9.343430674144976]
We propose a novel latent diffusion model-based framework to overcome the challenge of limited medical data.<n>Our framework achieves state-of-the-art results compared to existing methods, offering significant potential to enhance ophthalmic diagnostics and patient care.
arXiv Detail & Related papers (2024-12-17T10:37:46Z)
EyeDiff: text-to-image diffusion model improves rare eye disease diagnosis [7.884451100342276]
EyeDiff is a text-to-image model designed to generate multimodal ophthalmic images from natural language prompts. EyeDiff is trained on eight large-scale datasets and is adapted to ten multi-country external datasets.
arXiv Detail & Related papers (2024-11-15T07:30:53Z)
Neurovascular Segmentation in sOCT with Deep Learning and Synthetic Training Data [4.5276169699857505]
This study demonstrates a synthesis engine for neurovascular segmentation in serial-section optical coherence tomography images. Our approach comprises two phases: label synthesis and label-to-image transformation. We demonstrate the efficacy of the former by comparing it to several more realistic sets of training labels, and the latter by an ablation study of synthetic noise and artifact models.
arXiv Detail & Related papers (2024-07-01T16:09:07Z)
InceptionCaps: A Performant Glaucoma Classification Model for Data-scarce Environment [0.0]
glaucoma is an irreversible ocular disease and is the second leading cause of visual disability worldwide. This work reviews existing state of the art models and proposes InceptionCaps, a novel capsule network (CapsNet) based deep learning model having pre-trained InceptionV3 as its convolution base, for automatic glaucoma classification. InceptionCaps achieved an accuracy of 0.956, specificity of 0.96, and AUC of 0.9556, which surpasses several state-of-the-art deep learning model performances on the RIM-ONE v2 dataset.
arXiv Detail & Related papers (2023-11-24T11:58:11Z)
Affinity Feature Strengthening for Accurate, Complete and Robust Vessel Segmentation [48.638327652506284]
Vessel segmentation is crucial in many medical image applications, such as detecting coronary stenoses, retinal vessel diseases and brain aneurysms. We present a novel approach, the affinity feature strengthening network (AFN), which jointly models geometry and refines pixel-wise segmentation features using a contrast-insensitive, multiscale affinity approach.
arXiv Detail & Related papers (2022-11-12T05:39:17Z)
Multi-modal Retinal Image Registration Using a Keypoint-Based Vessel Structure Aligning Network [9.988115865060589]
We propose an end-to-end trainable deep learning method for multi-modal retinal image registration. Our method extracts convolutional features from the vessel structure for keypoint detection and description. The keypoint detection and description network and graph neural network are jointly trained in a self-supervised manner.
arXiv Detail & Related papers (2022-07-21T14:36:51Z)
OADAT: Experimental and Synthetic Clinical Optoacoustic Data for Standardized Image Processing [62.993663757843464]
Optoacoustic (OA) imaging is based on excitation of biological tissues with nanosecond-duration laser pulses followed by detection of ultrasound waves generated via light-absorption-mediated thermoelastic expansion. OA imaging features a powerful combination between rich optical contrast and high resolution in deep tissues. No standardized datasets generated with different types of experimental set-up and associated processing methods are available to facilitate advances in broader applications of OA in clinical settings.
arXiv Detail & Related papers (2022-06-17T08:11:26Z)
Assessing glaucoma in retinal fundus photographs using Deep Feature Consistent Variational Autoencoders [63.391402501241195]
glaucoma is challenging to detect since it remains asymptomatic until the symptoms are severe. Early identification of glaucoma is generally made based on functional, structural, and clinical assessments. Deep learning methods have partially solved this dilemma by bypassing the marker identification stage and analyzing high-level information directly to classify the data.
arXiv Detail & Related papers (2021-10-04T16:06:49Z)
A Benchmark for Studying Diabetic Retinopathy: Segmentation, Grading, and Transferability [76.64661091980531]
People with diabetes are at risk of developing diabetic retinopathy (DR) Computer-aided DR diagnosis is a promising tool for early detection of DR and severity grading. This dataset has 1,842 images with pixel-level DR-related lesion annotations, and 1,000 images with image-level labels graded by six board-certified ophthalmologists.
arXiv Detail & Related papers (2020-08-22T07:48:04Z)
NuI-Go: Recursive Non-Local Encoder-Decoder Network for Retinal Image Non-Uniform Illumination Removal [96.12120000492962]
The quality of retinal images is often clinically unsatisfactory due to eye lesions and imperfect imaging process. One of the most challenging quality degradation issues in retinal images is non-uniform illumination. We propose a non-uniform illumination removal network for retinal image, called NuI-Go.
arXiv Detail & Related papers (2020-08-07T04:31:33Z)
Modeling and Enhancing Low-quality Retinal Fundus Images [167.02325845822276]
Low-quality fundus images increase uncertainty in clinical observation and lead to the risk of misdiagnosis. We propose a clinically oriented fundus enhancement network (cofe-Net) to suppress global degradation factors. Experiments on both synthetic and real images demonstrate that our algorithm effectively corrects low-quality fundus images without losing retinal details.
arXiv Detail & Related papers (2020-05-12T08:01:16Z)
Retinopathy of Prematurity Stage Diagnosis Using Object Segmentation and Convolutional Neural Networks [68.96150598294072]
Retinopathy of Prematurity (ROP) is an eye disorder primarily affecting premature infants with lower weights. It causes proliferation of vessels in the retina and could result in vision loss and, eventually, retinal detachment, leading to blindness. In recent years, there has been a significant effort to automate the diagnosis using deep learning. This paper builds upon the success of previous models and develops a novel architecture, which combines object segmentation and convolutional neural networks (CNN) Our proposed system first trains an object segmentation model to identify the demarcation line at a pixel level and adds the resulting mask as an additional "color" channel in
arXiv Detail & Related papers (2020-04-03T14:07:41Z)
Synergic Adversarial Label Learning for Grading Retinal Diseases via Knowledge Distillation and Multi-task Learning [29.46896757506273]
Well-qualified doctors annotated images are very expensive and only a limited amount of data is available for various retinal diseases. Some studies show that AMD and DR share some common features like hemorrhagic points and exudation but most classification algorithms only train those disease models independently. We propose a method called synergic adversarial label learning (SALL) which leverages relevant retinal disease labels in both semantic and feature space as additional signals and train the model in a collaborative manner.
arXiv Detail & Related papers (2020-03-24T01:32:04Z)

This list is automatically generated from the titles and abstracts of the papers in this site.