EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model
- URL: http://arxiv.org/abs/2504.13650v1
- Date: Fri, 18 Apr 2025 12:09:15 GMT
- Title: EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model
- Authors: Sijing Li, Tianwei Lin, Lingshuai Lin, Wenqiao Zhang, Jiang Liu, Xiaoda Yang, Juncheng Li, Yucheng He, Xiaohui Song, Jun Xiao, Yueting Zhuang, Beng Chin Ooi,
- Abstract summary: Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare.<n>Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data; (ii) Benchmark; and (iii) Model.<n>We propose the Eyecare Kit, which tackles the aforementioned three key challenges with the tailored dataset, benchmark and model.
- Score: 51.66031028717933
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare, but their reliance on general medical data and coarse-grained global visual understanding limits them in intelligent ophthalmic diagnosis. Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data. The lack of deeply annotated, high-quality, multi-modal ophthalmic visual instruction data; (ii) Benchmark. The absence of a comprehensive and systematic benchmark for evaluating diagnostic performance; (iii) Model. The difficulty of adapting holistic visual architectures to fine-grained, region-specific ophthalmic lesion identification. In this paper, we propose the Eyecare Kit, which systematically tackles the aforementioned three key challenges with the tailored dataset, benchmark and model: First, we construct a multi-agent data engine with real-life ophthalmology data to produce Eyecare-100K, a high-quality ophthalmic visual instruction dataset. Subsequently, we design Eyecare-Bench, a benchmark that comprehensively evaluates the overall performance of LVLMs on intelligent ophthalmic diagnosis tasks across multiple dimensions. Finally, we develop the EyecareGPT, optimized for fine-grained ophthalmic visual understanding thoroughly, which incorporates an adaptive resolution mechanism and a layer-wise dense connector. Extensive experimental results indicate that the EyecareGPT achieves state-of-the-art performance in a range of ophthalmic tasks, underscoring its significant potential for the advancement of open research in intelligent ophthalmic diagnosis. Our project is available at https://github.com/DCDmllm/EyecareGPT.
Related papers
- LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models [38.78576472811659]
Large vision-language models (LVLMs) have the potential to assist in understanding anatomical information, diagnosing eye diseases, and drafting interpretations and follow-up plans.<n>We benchmarked 13 state-of-the-art LVLM representatives from closed-source, open-source, and medical domains.<n>The results demonstrate a significant performance drop for LVLMs in ophthalmology compared to other domains.
arXiv Detail & Related papers (2024-10-02T14:57:58Z) - EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis [20.318178211934985]
We propose EyeCLIP, a visual-language foundation model developed using over 2.77 million ophthalmology images with partial text data.
EyeCLIP can be transferred to a wide range of downstream tasks involving ocular and systemic diseases.
arXiv Detail & Related papers (2024-09-10T17:00:19Z) - VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge [26.93106207758859]
We introduce VisionUnite, a novel vision-language foundation model for ophthalmology enhanced with clinical knowledge.
VisionUnite has been pretrained on an extensive dataset comprising 1.24 million image-text pairs.
Our experiments indicate that VisionUnite outperforms existing generative foundation models such as GPT-4V and Gemini Pro.
arXiv Detail & Related papers (2024-08-05T23:31:07Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - EyeGPT: Ophthalmic Assistant with Large Language Models [6.678252895718266]
Large language models (LLM) trained with general world knowledge might not possess the capability to tackle medical-related tasks at an expert level.
Here, we introduce EyeGPT, a specialized LLM designed specifically for ophthalmology, using three optimization strategies including role-playing, finetuning, and retrieval-augmented generation.
By assessing the performance of different EyeGPT variants, we identify the most effective one, which exhibits comparable levels of understandability, trustworthiness, and empathy to human ophthalmologists.
arXiv Detail & Related papers (2024-02-29T09:35:41Z) - VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for
Generalist Ophthalmic Artificial Intelligence [27.92420837559191]
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals.
After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications.
The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases.
arXiv Detail & Related papers (2023-10-08T03:40:14Z) - DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical
Coherence Tomography Angiography Images [51.27125547308154]
We organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022)
The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading.
This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge.
arXiv Detail & Related papers (2023-04-05T12:04:55Z) - A Deep Learning Approach for the Segmentation of Electroencephalography
Data in Eye Tracking Applications [56.458448869572294]
We introduce DETRtime, a novel framework for time-series segmentation of EEG data.
Our end-to-end deep learning-based framework brings advances in Computer Vision to the forefront.
Our model generalizes well in the task of EEG sleep stage segmentation.
arXiv Detail & Related papers (2022-06-17T10:17:24Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - A Benchmark for Studying Diabetic Retinopathy: Segmentation, Grading,
and Transferability [76.64661091980531]
People with diabetes are at risk of developing diabetic retinopathy (DR)
Computer-aided DR diagnosis is a promising tool for early detection of DR and severity grading.
This dataset has 1,842 images with pixel-level DR-related lesion annotations, and 1,000 images with image-level labels graded by six board-certified ophthalmologists.
arXiv Detail & Related papers (2020-08-22T07:48:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.