VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for
Generalist Ophthalmic Artificial Intelligence
- URL: http://arxiv.org/abs/2310.04992v1
- Date: Sun, 8 Oct 2023 03:40:14 GMT
- Title: VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for
Generalist Ophthalmic Artificial Intelligence
- Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun,
Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang
Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai
Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu,
Aiguo Lv, Hui Miao, Li Guo, Shujun Zhang, Cheng Pei, Xiaojuan Fan, Jianqin
Lei, Ting Wei, Junguo Duan, Chun Liu, Xiaobo Xia, Siqi Xiong, Junhong Li,
Benny Lo, Yih Chung Tham, Tien Yin Wong, Ningli Wang, and Wu Yuan
- Abstract summary: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals.
After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications.
The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases.
- Score: 27.92420837559191
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million
ophthalmic images from 560,457 individuals, covering a broad range of
ophthalmic diseases, modalities, imaging devices, and demography. After
pre-training, VisionFM provides a foundation to foster multiple ophthalmic
artificial intelligence (AI) applications, such as disease screening and
diagnosis, disease prognosis, subclassification of disease phenotype, and
systemic biomarker and disease prediction, with each application enhanced with
expert-level intelligence and accuracy. The generalist intelligence of VisionFM
outperformed ophthalmologists with basic and intermediate levels in jointly
diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale
ophthalmic disease diagnosis benchmark database, as well as a new large-scale
segmentation and detection benchmark database, VisionFM outperformed strong
baseline deep neural networks. The ophthalmic image representations learned by
VisionFM exhibited noteworthy explainability, and demonstrated strong
generalizability to new ophthalmic modalities, disease spectrum, and imaging
devices. As a foundation model, VisionFM has a large capacity to learn from
diverse ophthalmic imaging data and disparate datasets. To be commensurate with
this capacity, in addition to the real data used for pre-training, we also
generated and leveraged synthetic ophthalmic imaging data. Experimental results
revealed that synthetic data that passed visual Turing tests, can also enhance
the representation learning capability of VisionFM, leading to substantial
performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI
applications developed, validated, and demonstrated in this work, substantial
further applications can be achieved in an efficient and cost-effective manner
using VisionFM as the foundation.
Related papers
- Enhancing Retinal Disease Classification from OCTA Images via Active Learning Techniques [0.8035416719640156]
Eye diseases are common in older Americans and can lead to decreased vision and blindness.
Recent advancements in imaging technologies allow clinicians to capture high-quality images of the retinal blood vessels via Optical Coherence Tomography Angiography ( OCTA)
OCTA provides detailed vascular imaging as compared to the solely structural information obtained by common OCT imaging.
arXiv Detail & Related papers (2024-07-21T23:24:49Z) - M3T: Multi-Modal Medical Transformer to bridge Clinical Context with Visual Insights for Retinal Image Medical Description Generation [1.8789068567093286]
Multi-Modal Medical Transformer (M3T) is a novel deep learning architecture that integrates visual representations with diagnostic keywords.
Experimental studies on the DeepEyeNet dataset validate the success of M3T in meeting ophthalmologists' standards.
arXiv Detail & Related papers (2024-06-19T00:46:48Z) - EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging [13.88319807760491]
We present EyeFound, a multimodal foundation model for ophthalmic images.
It learns generalizable representations from unlabeled multimodal retinal images.
It is trained on 2.78 million images from 227 hospitals across 11 ophthalmic modalities.
arXiv Detail & Related papers (2024-05-18T17:03:39Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - Can GPT-4V(ision) Serve Medical Applications? Case Studies on GPT-4V for
Multimodal Medical Diagnosis [59.35504779947686]
GPT-4V is OpenAI's newest model for multimodal medical diagnosis.
Our evaluation encompasses 17 human body systems.
GPT-4V demonstrates proficiency in distinguishing between medical image modalities and anatomy.
It faces significant challenges in disease diagnosis and generating comprehensive reports.
arXiv Detail & Related papers (2023-10-15T18:32:27Z) - A Foundation LAnguage-Image model of the Retina (FLAIR): Encoding expert
knowledge in text supervision [17.583536041845402]
We present FLAIR, a pre-trained vision-language model for universal retinal fundus image understanding.
We compiled 37 open-access, mostly categorical fundus imaging datasets from various sources.
We integrate the expert's domain knowledge in the form of descriptive textual prompts, during both pre-training and zero-shot inference.
arXiv Detail & Related papers (2023-08-15T17:39:52Z) - OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant
based on Instructions and Dialogue [7.140551103766788]
We introduce visual ability into the large language model to complete the ophthalmic large language and vision assistant (OphGLM)
Our experimental results demonstrate that the OphGLM model performs exceptionally well, and it has the potential to revolutionize clinical applications in ophthalmology.
arXiv Detail & Related papers (2023-06-21T11:09:48Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical
Coherence Tomography Angiography Images [51.27125547308154]
We organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022)
The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading.
This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge.
arXiv Detail & Related papers (2023-04-05T12:04:55Z) - An Interpretable Multiple-Instance Approach for the Detection of
referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images.
By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy.
We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.