VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for
Generalist Ophthalmic Artificial Intelligence
- URL: http://arxiv.org/abs/2310.04992v1
- Date: Sun, 8 Oct 2023 03:40:14 GMT
- Title: VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for
Generalist Ophthalmic Artificial Intelligence
- Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun,
Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang
Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai
Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu,
Aiguo Lv, Hui Miao, Li Guo, Shujun Zhang, Cheng Pei, Xiaojuan Fan, Jianqin
Lei, Ting Wei, Junguo Duan, Chun Liu, Xiaobo Xia, Siqi Xiong, Junhong Li,
Benny Lo, Yih Chung Tham, Tien Yin Wong, Ningli Wang, and Wu Yuan
- Abstract summary: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals.
After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications.
The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases.
- Score: 27.92420837559191
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million
ophthalmic images from 560,457 individuals, covering a broad range of
ophthalmic diseases, modalities, imaging devices, and demography. After
pre-training, VisionFM provides a foundation to foster multiple ophthalmic
artificial intelligence (AI) applications, such as disease screening and
diagnosis, disease prognosis, subclassification of disease phenotype, and
systemic biomarker and disease prediction, with each application enhanced with
expert-level intelligence and accuracy. The generalist intelligence of VisionFM
outperformed ophthalmologists with basic and intermediate levels in jointly
diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale
ophthalmic disease diagnosis benchmark database, as well as a new large-scale
segmentation and detection benchmark database, VisionFM outperformed strong
baseline deep neural networks. The ophthalmic image representations learned by
VisionFM exhibited noteworthy explainability, and demonstrated strong
generalizability to new ophthalmic modalities, disease spectrum, and imaging
devices. As a foundation model, VisionFM has a large capacity to learn from
diverse ophthalmic imaging data and disparate datasets. To be commensurate with
this capacity, in addition to the real data used for pre-training, we also
generated and leveraged synthetic ophthalmic imaging data. Experimental results
revealed that synthetic data that passed visual Turing tests, can also enhance
the representation learning capability of VisionFM, leading to substantial
performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI
applications developed, validated, and demonstrated in this work, substantial
further applications can be achieved in an efficient and cost-effective manner
using VisionFM as the foundation.
Related papers
- EyeDiff: text-to-image diffusion model improves rare eye disease diagnosis [7.884451100342276]
EyeDiff is a text-to-image model designed to generate multimodal ophthalmic images from natural language prompts.
EyeDiff is trained on eight large-scale datasets and is adapted to ten multi-country external datasets.
arXiv Detail & Related papers (2024-11-15T07:30:53Z) - LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models [38.78576472811659]
Large vision-language models (LVLMs) have the potential to assist in understanding anatomical information, diagnosing eye diseases, and drafting interpretations and follow-up plans.
We benchmarked 13 state-of-the-art LVLM representatives from closed-source, open-source, and medical domains.
The results demonstrate a significant performance drop for LVLMs in ophthalmology compared to other domains.
arXiv Detail & Related papers (2024-10-02T14:57:58Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis [20.318178211934985]
We propose EyeCLIP, a visual-language foundation model developed using over 2.77 million ophthalmology images with partial text data.
EyeCLIP can be transferred to a wide range of downstream tasks involving ocular and systemic diseases.
arXiv Detail & Related papers (2024-09-10T17:00:19Z) - VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge [26.93106207758859]
We introduce VisionUnite, a novel vision-language foundation model for ophthalmology enhanced with clinical knowledge.
VisionUnite has been pretrained on an extensive dataset comprising 1.24 million image-text pairs.
Our experiments indicate that VisionUnite outperforms existing generative foundation models such as GPT-4V and Gemini Pro.
arXiv Detail & Related papers (2024-08-05T23:31:07Z) - EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging [13.88319807760491]
We present EyeFound, a multimodal foundation model for ophthalmic images.
It learns generalizable representations from unlabeled multimodal retinal images.
It is trained on 2.78 million images from 227 hospitals across 11 ophthalmic modalities.
arXiv Detail & Related papers (2024-05-18T17:03:39Z) - LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical
Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets.
We have collected approximately 1.3 million medical images from 55 publicly available datasets.
LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Artificial General Intelligence for Medical Imaging Analysis [92.3940918983821]
Large-scale Artificial General Intelligence (AGI) models have achieved unprecedented success in a variety of general domain tasks.
These models face notable challenges arising from the medical field's inherent complexities and unique characteristics.
This review aims to offer insights into the future implications of AGI in medical imaging, healthcare, and beyond.
arXiv Detail & Related papers (2023-06-08T18:04:13Z) - DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical
Coherence Tomography Angiography Images [51.27125547308154]
We organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022)
The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading.
This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge.
arXiv Detail & Related papers (2023-04-05T12:04:55Z) - An Interpretable Multiple-Instance Approach for the Detection of
referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images.
By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy.
We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.