OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant
based on Instructions and Dialogue
- URL: http://arxiv.org/abs/2306.12174v2
- Date: Thu, 22 Jun 2023 01:31:10 GMT
- Title: OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant
based on Instructions and Dialogue
- Authors: Weihao Gao, Zhuo Deng, Zhiyuan Niu, Fuju Rong, Chucheng Chen, Zheng
Gong, Wenze Zhang, Daimin Xiao, Fang Li, Zhenjie Cao, Zhaoyi Ma, Wenbin Wei,
Lan Ma
- Abstract summary: We introduce visual ability into the large language model to complete the ophthalmic large language and vision assistant (OphGLM)
Our experimental results demonstrate that the OphGLM model performs exceptionally well, and it has the potential to revolutionize clinical applications in ophthalmology.
- Score: 7.140551103766788
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large multimodal language models (LMMs) have achieved significant success in
general domains. However, due to the significant differences between medical
images and text and general web content, the performance of LMMs in medical
scenarios is limited. In ophthalmology, clinical diagnosis relies on multiple
modalities of medical images, but unfortunately, multimodal ophthalmic large
language models have not been explored to date. In this paper, we study and
construct an ophthalmic large multimodal model. Firstly, we use fundus images
as an entry point to build a disease assessment and diagnosis pipeline to
achieve common ophthalmic disease diagnosis and lesion segmentation. Then, we
establish a new ophthalmic multimodal instruction-following and dialogue
fine-tuning dataset based on disease-related knowledge data and publicly
available real-world medical dialogue. We introduce visual ability into the
large language model to complete the ophthalmic large language and vision
assistant (OphGLM). Our experimental results demonstrate that the OphGLM model
performs exceptionally well, and it has the potential to revolutionize clinical
applications in ophthalmology. The dataset, code, and models will be made
publicly available at https://github.com/ML-AILab/OphGLM.
Related papers
- EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging [13.88319807760491]
We present EyeFound, a multimodal foundation model for ophthalmic images.
It learns generalizable representations from unlabeled multimodal retinal images.
It is trained on 2.78 million images from 227 hospitals across 11 ophthalmic modalities.
arXiv Detail & Related papers (2024-05-18T17:03:39Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning [9.913879680322042]
The lack of extensive and high-quality image-text data in medicine has greatly hindered the development of large-scale medical vision-language models.
We present a diagnosis-guided bootstrapping strategy that exploits both image and label information to construct vision-language datasets.
arXiv Detail & Related papers (2024-04-23T15:27:19Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - On Large Visual Language Models for Medical Imaging Analysis: An
Empirical Study [13.972931873011914]
Large language models (LLMs) have taken the spotlight in natural language processing.
Visual language models (VLMs), such as LLaVA, Flamingo, or CLIP, have demonstrated impressive performance on various visio-linguistic tasks.
arXiv Detail & Related papers (2024-02-21T23:01:38Z) - Ophtha-LLaMA2: A Large Language Model for Ophthalmology [31.39653268440651]
Large language models (LLMs) have achieved tremendous success in the field of Natural Language Processing (NLP)
In this study, we build an LLM termed the "Ophtha-LLaMA2" specifically tailored for ophthalmic disease diagnosis.
Inference test results show that even with a smaller fine-tuning dataset, Ophtha-LLaMA2 performs significantly better in ophthalmic diagnosis.
arXiv Detail & Related papers (2023-12-08T08:43:46Z) - VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for
Generalist Ophthalmic Artificial Intelligence [27.92420837559191]
We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals.
After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications.
The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases.
arXiv Detail & Related papers (2023-10-08T03:40:14Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - LLaVA-Med: Training a Large Language-and-Vision Assistant for
Biomedicine in One Day [85.19963303642427]
We propose a cost-efficient approach for training a vision-language conversational assistant that can answer open-ended research questions of biomedical images.
The model first learns to align biomedical vocabulary using the figure-caption pairs as is, then learns to master open-ended conversational semantics.
This enables us to train a Large Language and Vision Assistant for BioMedicine in less than 15 hours (with eight A100s)
arXiv Detail & Related papers (2023-06-01T16:50:07Z) - ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using
Large Language Models [53.73049253535025]
Large language models (LLMs) have recently demonstrated their potential in clinical applications.
This paper presents a method for integrating LLMs into medical-image CAD networks.
The goal is to merge the strengths of LLMs' medical domain knowledge and logical reasoning with the vision understanding capability of existing medical-image CAD models.
arXiv Detail & Related papers (2023-02-14T18:54:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.