LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models
- URL: http://arxiv.org/abs/2410.01620v5
- Date: Wed, 05 Feb 2025 18:36:42 GMT
- Title: LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models
- Authors: Zhenyue Qin, Yu Yin, Dylan Campbell, Xuansheng Wu, Ke Zou, Yih-Chung Tham, Ninghao Liu, Xiuzhen Zhang, Qingyu Chen,
- Abstract summary: Large vision-language models (LVLMs) have the potential to assist in understanding anatomical information, diagnosing eye diseases, and drafting interpretations and follow-up plans.
We benchmarked 13 state-of-the-art LVLM representatives from closed-source, open-source, and medical domains.
The results demonstrate a significant performance drop for LVLMs in ophthalmology compared to other domains.
- Score: 38.78576472811659
- License:
- Abstract: The prevalence of vision-threatening eye diseases is a significant global burden, with many cases remaining undiagnosed or diagnosed too late for effective treatment. Large vision-language models (LVLMs) have the potential to assist in understanding anatomical information, diagnosing eye diseases, and drafting interpretations and follow-up plans, thereby reducing the burden on clinicians and improving access to eye care. However, limited benchmarks are available to assess LVLMs' performance in ophthalmology-specific applications. In this study, we introduce LMOD, a large-scale multimodal ophthalmology benchmark consisting of 21,993 instances across (1) five ophthalmic imaging modalities: optical coherence tomography, color fundus photographs, scanning laser ophthalmoscopy, lens photographs, and surgical scenes; (2) free-text, demographic, and disease biomarker information; and (3) primary ophthalmology-specific applications such as anatomical information understanding, disease diagnosis, and subgroup analysis. In addition, we benchmarked 13 state-of-the-art LVLM representatives from closed-source, open-source, and medical domains. The results demonstrate a significant performance drop for LVLMs in ophthalmology compared to other domains. Systematic error analysis further identified six major failure modes: misclassification, failure to abstain, inconsistent reasoning, hallucination, assertions without justification, and lack of domain-specific knowledge. In contrast, supervised neural networks specifically trained on these tasks as baselines demonstrated high accuracy. These findings underscore the pressing need for benchmarks in the development and validation of ophthalmology-specific LVLMs.
Related papers
- EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis [20.318178211934985]
We propose EyeCLIP, a visual-language foundation model developed using over 2.77 million ophthalmology images with partial text data.
EyeCLIP can be transferred to a wide range of downstream tasks involving ocular and systemic diseases.
arXiv Detail & Related papers (2024-09-10T17:00:19Z) - Potential of Multimodal Large Language Models for Data Mining of Medical Images and Free-text Reports [51.45762396192655]
Multimodal large language models (MLLMs) have recently transformed many domains, significantly affecting the medical field. Notably, Gemini-Vision-series (Gemini) and GPT-4-series (GPT-4) models have epitomized a paradigm shift in Artificial General Intelligence for computer vision.
This study evaluated the performance of the Gemini, GPT-4, and 4 popular large models for an exhaustive evaluation across 14 medical imaging datasets.
arXiv Detail & Related papers (2024-07-08T09:08:42Z) - Ophtha-LLaMA2: A Large Language Model for Ophthalmology [31.39653268440651]
Large language models (LLMs) have achieved tremendous success in the field of Natural Language Processing (NLP)
In this study, we build an LLM termed the "Ophtha-LLaMA2" specifically tailored for ophthalmic disease diagnosis.
Inference test results show that even with a smaller fine-tuning dataset, Ophtha-LLaMA2 performs significantly better in ophthalmic diagnosis.
arXiv Detail & Related papers (2023-12-08T08:43:46Z) - VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for
Generalist Ophthalmic Artificial Intelligence [27.92420837559191]
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals.
After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications.
The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases.
arXiv Detail & Related papers (2023-10-08T03:40:14Z) - OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant
based on Instructions and Dialogue [7.140551103766788]
We introduce visual ability into the large language model to complete the ophthalmic large language and vision assistant (OphGLM)
Our experimental results demonstrate that the OphGLM model performs exceptionally well, and it has the potential to revolutionize clinical applications in ophthalmology.
arXiv Detail & Related papers (2023-06-21T11:09:48Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical
Coherence Tomography Angiography Images [51.27125547308154]
We organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022)
The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading.
This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge.
arXiv Detail & Related papers (2023-04-05T12:04:55Z) - Efficient Screening of Diseased Eyes based on Fundus Autofluorescence
Images using Support Vector Machine [0.12189422792863448]
A variety of vision ailments are associated with geographic atrophy (GA) in the foveal region of the eye.
In current clinical practice, the ophthalmologist manually detects potential presence of such GA based on fundus autofluorescence (FAF) images.
We propose a screening step, where healthy and diseased eyes are algorithmically differentiated with limited input from only optometrists.
arXiv Detail & Related papers (2021-04-17T11:54:34Z) - An Interpretable Multiple-Instance Approach for the Detection of
referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images.
By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy.
We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z) - Modeling and Enhancing Low-quality Retinal Fundus Images [167.02325845822276]
Low-quality fundus images increase uncertainty in clinical observation and lead to the risk of misdiagnosis.
We propose a clinically oriented fundus enhancement network (cofe-Net) to suppress global degradation factors.
Experiments on both synthetic and real images demonstrate that our algorithm effectively corrects low-quality fundus images without losing retinal details.
arXiv Detail & Related papers (2020-05-12T08:01:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.