EyeDiff: text-to-image diffusion model improves rare eye disease diagnosis
- URL: http://arxiv.org/abs/2411.10004v1
- Date: Fri, 15 Nov 2024 07:30:53 GMT
- Title: EyeDiff: text-to-image diffusion model improves rare eye disease diagnosis
- Authors: Ruoyu Chen, Weiyi Zhang, Bowen Liu, Xiaolan Chen, Pusheng Xu, Shunming Liu, Mingguang He, Danli Shi,
- Abstract summary: EyeDiff is a text-to-image model designed to generate multimodal ophthalmic images from natural language prompts.
EyeDiff is trained on eight large-scale datasets and is adapted to ten multi-country external datasets.
- Score: 7.884451100342276
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The rising prevalence of vision-threatening retinal diseases poses a significant burden on the global healthcare systems. Deep learning (DL) offers a promising solution for automatic disease screening but demands substantial data. Collecting and labeling large volumes of ophthalmic images across various modalities encounters several real-world challenges, especially for rare diseases. Here, we introduce EyeDiff, a text-to-image model designed to generate multimodal ophthalmic images from natural language prompts and evaluate its applicability in diagnosing common and rare diseases. EyeDiff is trained on eight large-scale datasets using the advanced latent diffusion model, covering 14 ophthalmic image modalities and over 80 ocular diseases, and is adapted to ten multi-country external datasets. The generated images accurately capture essential lesional characteristics, achieving high alignment with text prompts as evaluated by objective metrics and human experts. Furthermore, integrating generated images significantly enhances the accuracy of detecting minority classes and rare eye diseases, surpassing traditional oversampling methods in addressing data imbalance. EyeDiff effectively tackles the issue of data imbalance and insufficiency typically encountered in rare diseases and addresses the challenges of collecting large-scale annotated images, offering a transformative solution to enhance the development of expert-level diseases diagnosis models in ophthalmic field.
Related papers
- Causal Disentanglement for Robust Long-tail Medical Image Generation [80.15257897500578]
We propose a novel medical image generation framework, which generates independent pathological and structural features.
We leverage a diffusion model guided by pathological findings to model pathological features, enabling the generation of diverse counterfactual images.
arXiv Detail & Related papers (2025-04-20T01:54:18Z) - EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model [51.66031028717933]
Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare.
Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data; (ii) Benchmark; and (iii) Model.
We propose the Eyecare Kit, which tackles the aforementioned three key challenges with the tailored dataset, benchmark and model.
arXiv Detail & Related papers (2025-04-18T12:09:15Z) - Enhancing Fundus Image-based Glaucoma Screening via Dynamic Global-Local Feature Integration [26.715346685730484]
We propose a self-adaptive attention window that autonomously determines optimal boundaries for enhanced feature extraction.
We also introduce a multi-head attention mechanism to effectively fuse global and local features via feature linear readout.
Experimental results demonstrate that our method achieves superior accuracy and robustness in glaucoma classification.
arXiv Detail & Related papers (2025-04-01T05:28:14Z) - Retinal Fundus Multi-Disease Image Classification using Hybrid CNN-Transformer-Ensemble Architectures [0.3277163122167434]
Our research is motivated by the urgent global issue of a large population affected by retinal diseases.
Our primary objective is to develop a comprehensive diagnostic system capable of accurately predicting retinal diseases.
arXiv Detail & Related papers (2025-03-27T12:55:07Z) - Latent Drifting in Diffusion Models for Counterfactual Medical Image Synthesis [55.959002385347645]
Latent Drifting enables diffusion models to be conditioned for medical images fitted for the complex task of counterfactual image generation.
We evaluate our method on three public longitudinal benchmark datasets of brain MRI and chest X-rays for counterfactual image generation.
arXiv Detail & Related papers (2024-12-30T01:59:34Z) - Rethinking Diffusion-Based Image Generators for Fundus Fluorescein Angiography Synthesis on Limited Data [9.343430674144976]
We propose a novel latent diffusion model-based framework to overcome the challenge of limited medical data.
Our framework achieves state-of-the-art results compared to existing methods, offering significant potential to enhance ophthalmic diagnostics and patient care.
arXiv Detail & Related papers (2024-12-17T10:37:46Z) - EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis [20.318178211934985]
We propose EyeCLIP, a visual-language foundation model developed using over 2.77 million ophthalmology images with partial text data.
EyeCLIP can be transferred to a wide range of downstream tasks involving ocular and systemic diseases.
arXiv Detail & Related papers (2024-09-10T17:00:19Z) - A Disease-Specific Foundation Model Using Over 100K Fundus Images: Release and Validation for Abnormality and Multi-Disease Classification on Downstream Tasks [0.0]
We developed a Fundus-Specific Pretrained Model (Image+Fundus), a supervised artificial intelligence model trained to detect abnormalities in fundus images.
A total of 57,803 images were used to develop this pretrained model, which achieved superior performance across various downstream tasks.
arXiv Detail & Related papers (2024-08-16T15:03:06Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging [13.88319807760491]
We present EyeFound, a multimodal foundation model for ophthalmic images.
It learns generalizable representations from unlabeled multimodal retinal images.
It is trained on 2.78 million images from 227 hospitals across 11 ophthalmic modalities.
arXiv Detail & Related papers (2024-05-18T17:03:39Z) - Diagnosis of Multiple Fundus Disorders Amidst a Scarcity of Medical Experts Via Self-supervised Machine Learning [13.174267261284733]
Fundus diseases are major causes of visual impairment and blindness worldwide.
We propose a general self-supervised machine learning framework that can handle diverse fundus diseases from unlabeled fundus images.
arXiv Detail & Related papers (2024-04-20T14:15:25Z) - Reliable Multimodality Eye Disease Screening via Mixture of Student's t
Distributions [49.4545260500952]
We introduce a novel multimodality evidential fusion pipeline for eye disease screening, EyeMoSt.
Our model estimates both local uncertainty for unimodality and global uncertainty for the fusion modality to produce reliable classification results.
Our experimental findings on both public and in-house datasets show that our model is more reliable than current methods.
arXiv Detail & Related papers (2023-03-17T06:18:16Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - Assessing glaucoma in retinal fundus photographs using Deep Feature
Consistent Variational Autoencoders [63.391402501241195]
glaucoma is challenging to detect since it remains asymptomatic until the symptoms are severe.
Early identification of glaucoma is generally made based on functional, structural, and clinical assessments.
Deep learning methods have partially solved this dilemma by bypassing the marker identification stage and analyzing high-level information directly to classify the data.
arXiv Detail & Related papers (2021-10-04T16:06:49Z) - An Interpretable Multiple-Instance Approach for the Detection of
referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images.
By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy.
We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z) - Many-to-One Distribution Learning and K-Nearest Neighbor Smoothing for
Thoracic Disease Identification [83.6017225363714]
deep learning has become the most powerful computer-aided diagnosis technology for improving disease identification performance.
For chest X-ray imaging, annotating large-scale data requires professional domain knowledge and is time-consuming.
In this paper, we propose many-to-one distribution learning (MODL) and K-nearest neighbor smoothing (KNNS) methods to improve a single model's disease identification performance.
arXiv Detail & Related papers (2021-02-26T02:29:30Z) - Improving Robustness using Joint Attention Network For Detecting Retinal
Degeneration From Optical Coherence Tomography Images [0.0]
We propose the use of disease-specific feature representation as a novel architecture comprised of two joint networks.
Our experimental results on publicly available datasets show the proposed joint-network significantly improves the accuracy and robustness of state-of-the-art retinal disease classification networks on unseen datasets.
arXiv Detail & Related papers (2020-05-16T20:32:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.