EyeGPT: Ophthalmic Assistant with Large Language Models
- URL: http://arxiv.org/abs/2403.00840v1
- Date: Thu, 29 Feb 2024 09:35:41 GMT
- Title: EyeGPT: Ophthalmic Assistant with Large Language Models
- Authors: Xiaolan Chen, Ziwei Zhao, Weiyi Zhang, Pusheng Xu, Le Gao, Mingpu Xu,
Yue Wu, Yinwen Li, Danli Shi, Mingguang He
- Abstract summary: Large language models (LLM) trained with general world knowledge might not possess the capability to tackle medical-related tasks at an expert level.
Here, we introduce EyeGPT, a specialized LLM designed specifically for ophthalmology, using three optimization strategies including role-playing, finetuning, and retrieval-augmented generation.
By assessing the performance of different EyeGPT variants, we identify the most effective one, which exhibits comparable levels of understandability, trustworthiness, and empathy to human ophthalmologists.
- Score: 6.678252895718266
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Artificial intelligence (AI) has gained significant attention in healthcare
consultation due to its potential to improve clinical workflow and enhance
medical communication. However, owing to the complex nature of medical
information, large language models (LLM) trained with general world knowledge
might not possess the capability to tackle medical-related tasks at an expert
level. Here, we introduce EyeGPT, a specialized LLM designed specifically for
ophthalmology, using three optimization strategies including role-playing,
finetuning, and retrieval-augmented generation. In particular, we proposed a
comprehensive evaluation framework that encompasses a diverse dataset, covering
various subspecialties of ophthalmology, different users, and diverse inquiry
intents. Moreover, we considered multiple evaluation metrics, including
accuracy, understandability, trustworthiness, empathy, and the proportion of
hallucinations. By assessing the performance of different EyeGPT variants, we
identify the most effective one, which exhibits comparable levels of
understandability, trustworthiness, and empathy to human ophthalmologists (all
Ps>0.05). Overall, ur study provides valuable insights for future research,
facilitating comprehensive comparisons and evaluations of different strategies
for developing specialized LLMs in ophthalmology. The potential benefits
include enhancing the patient experience in eye care and optimizing
ophthalmologists' services.
Related papers
- EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model [51.66031028717933]
Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare.
Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data; (ii) Benchmark; and (iii) Model.
We propose the Eyecare Kit, which tackles the aforementioned three key challenges with the tailored dataset, benchmark and model.
arXiv Detail & Related papers (2025-04-18T12:09:15Z) - MIL vs. Aggregation: Evaluating Patient-Level Survival Prediction Strategies Using Graph-Based Learning [52.231128973251124]
We compare various strategies for predicting survival at the WSI and patient level.
The former treats each WSI as an independent sample, mimicking the strategy adopted in other works.
The latter comprises methods to either aggregate the predictions of the several WSIs or automatically identify the most relevant slide.
arXiv Detail & Related papers (2025-03-29T11:14:02Z) - Conversation AI Dialog for Medicare powered by Finetuning and Retrieval Augmented Generation [0.0]
Large language models (LLMs) have shown impressive capabilities in natural language processing tasks, including dialogue generation.
This research aims to conduct a novel comparative analysis of two prominent techniques, fine-tuning with LoRA and the Retrieval-Augmented Generation framework.
arXiv Detail & Related papers (2025-02-04T11:50:40Z) - A Survey of Medical Vision-and-Language Applications and Their Techniques [48.268198631277315]
Medical vision-and-language models (MVLMs) have attracted substantial interest due to their capability to offer a natural language interface for interpreting complex medical data.
Here, we provide a comprehensive overview of MVLMs and the various medical tasks to which they have been applied.
We also examine the datasets used for these tasks and compare the performance of different models based on standardized evaluation metrics.
arXiv Detail & Related papers (2024-11-19T03:27:05Z) - Visual Question Answering in Ophthalmology: A Progressive and Practical Perspective [3.362457692154382]
Visual Question Answering (VQA) combines computer vision and natural language processing to comprehend and respond to queries about medical images.
This review article explores the recent advancements and future prospects of VQA in ophthalmology from both theoretical and practical perspectives.
arXiv Detail & Related papers (2024-10-22T03:28:41Z) - VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge [26.93106207758859]
We introduce VisionUnite, a novel vision-language foundation model for ophthalmology enhanced with clinical knowledge.
VisionUnite has been pretrained on an extensive dataset comprising 1.24 million image-text pairs.
Our experiments indicate that VisionUnite outperforms existing generative foundation models such as GPT-4V and Gemini Pro.
arXiv Detail & Related papers (2024-08-05T23:31:07Z) - A Role-specific Guided Large Language Model for Ophthalmic Consultation Based on Stylistic Differentiation [2.0671213754662343]
We propose EyeDoctor, an ophthalmic medical questioning large language model.
Experimental results show EyeDoctor achieves higher question-answering precision in ophthalmology consultations.
arXiv Detail & Related papers (2024-07-26T03:23:31Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - When Eye-Tracking Meets Machine Learning: A Systematic Review on
Applications in Medical Image Analysis [2.9122893700072554]
Eye tracking, a technology that monitors and records the movement of the eyes, provides valuable insights into human visual attention patterns.
Eye-gaze tracking data, with intricate human visual attention patterns embedded, provides a bridge to integrating artificial intelligence (AI) development and human cognition.
This systematic review investigates eye-gaze tracking applications and methodologies for enhancing ML/DL algorithms for medical image analysis in depth.
arXiv Detail & Related papers (2024-03-12T17:17:20Z) - Evaluation of General Large Language Models in Contextually Assessing
Semantic Concepts Extracted from Adult Critical Care Electronic Health Record
Notes [17.648021186810663]
The purpose of this study was to evaluate the performance of Large Language Models (LLMs) in understanding and processing real-world clinical notes.
The GPT family models have demonstrated considerable efficiency, evidenced by their cost-effectiveness and time-saving capabilities.
arXiv Detail & Related papers (2024-01-24T16:52:37Z) - A Comprehensive Evaluation of GPT-4V on Knowledge-Intensive Visual Question Answering [53.70661720114377]
multimodal large models (MLMs) have significantly advanced the field of visual understanding, offering remarkable capabilities in realm of visual question answering (VQA)
Yet, the true challenge lies in the domain of knowledge-intensive VQA tasks, which necessitate deep comprehension of the visual information in conjunction with a vast repository of learned knowledge.
To uncover such capabilities, we provide an in-depth evaluation from three perspectives: 1) Commonsense Knowledge, which assesses how well models can understand visual cues and connect to general knowledge; 2) Fine-grained World Knowledge, which tests the model's skill in reasoning out specific knowledge from images, showcasing
arXiv Detail & Related papers (2023-11-13T18:22:32Z) - A Systematic Evaluation of GPT-4V's Multimodal Capability for Medical
Image Analysis [87.25494411021066]
GPT-4V's multimodal capability for medical image analysis is evaluated.
It is found that GPT-4V excels in understanding medical images and generates high-quality radiology reports.
It is found that its performance for medical visual grounding needs to be substantially improved.
arXiv Detail & Related papers (2023-10-31T11:39:09Z) - Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions.
We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z) - Align, Reason and Learn: Enhancing Medical Vision-and-Language
Pre-training with Knowledge [68.90835997085557]
We propose a systematic and effective approach to enhance structured medical knowledge from three perspectives.
First, we align the representations of the vision encoder and the language encoder through knowledge.
Second, we inject knowledge into the multi-modal fusion model to enable the model to perform reasoning using knowledge as the supplementation of the input image and text.
Third, we guide the model to put emphasis on the most critical information in images and texts by designing knowledge-induced pretext tasks.
arXiv Detail & Related papers (2022-09-15T08:00:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.