Related papers: VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence

URL: http://arxiv.org/abs/2310.04992v1
Date: Sun, 8 Oct 2023 03:40:14 GMT
Title: VisionFM: a Multi-Modal Multi-Task Vision Foundation Model for Generalist Ophthalmic Artificial Intelligence
Authors: Jianing Qiu, Jian Wu, Hao Wei, Peilun Shi, Minqing Zhang, Yunyun Sun, Lin Li, Hanruo Liu, Hongyi Liu, Simeng Hou, Yuyang Zhao, Xuehui Shi, Junfang Xian, Xiaoxia Qu, Sirui Zhu, Lijie Pan, Xiaoniao Chen, Xiaojia Zhang, Shuai Jiang, Kebing Wang, Chenlong Yang, Mingqiang Chen, Sujie Fan, Jianhua Hu, Aiguo Lv, Hui Miao, Li Guo, Shujun Zhang, Cheng Pei, Xiaojuan Fan, Jianqin Lei, Ting Wei, Junguo Duan, Chun Liu, Xiaobo Xia, Siqi Xiong, Junhong Li, Benny Lo, Yih Chung Tham, Tien Yin Wong, Ningli Wang, and Wu Yuan
Abstract summary: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases.
Score: 27.92420837559191
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: We present VisionFM, a foundation model pre-trained with 3.4 million ophthalmic images from 560,457 individuals, covering a broad range of ophthalmic diseases, modalities, imaging devices, and demography. After pre-training, VisionFM provides a foundation to foster multiple ophthalmic artificial intelligence (AI) applications, such as disease screening and diagnosis, disease prognosis, subclassification of disease phenotype, and systemic biomarker and disease prediction, with each application enhanced with expert-level intelligence and accuracy. The generalist intelligence of VisionFM outperformed ophthalmologists with basic and intermediate levels in jointly diagnosing 12 common ophthalmic diseases. Evaluated on a new large-scale ophthalmic disease diagnosis benchmark database, as well as a new large-scale segmentation and detection benchmark database, VisionFM outperformed strong baseline deep neural networks. The ophthalmic image representations learned by VisionFM exhibited noteworthy explainability, and demonstrated strong generalizability to new ophthalmic modalities, disease spectrum, and imaging devices. As a foundation model, VisionFM has a large capacity to learn from diverse ophthalmic imaging data and disparate datasets. To be commensurate with this capacity, in addition to the real data used for pre-training, we also generated and leveraged synthetic ophthalmic imaging data. Experimental results revealed that synthetic data that passed visual Turing tests, can also enhance the representation learning capability of VisionFM, leading to substantial performance gains on downstream ophthalmic AI tasks. Beyond the ophthalmic AI applications developed, validated, and demonstrated in this work, substantial further applications can be achieved in an efficient and cost-effective manner using VisionFM as the foundation.

Related papers

EyecareGPT: Boosting Comprehensive Ophthalmology Understanding with Tailored Dataset, Benchmark and Model [51.66031028717933]
Medical Large Vision-Language Models (Med-LVLMs) demonstrate significant potential in healthcare. Currently, intelligent ophthalmic diagnosis faces three major challenges: (i) Data; (ii) Benchmark; and (iii) Model. We propose the Eyecare Kit, which tackles the aforementioned three key challenges with the tailored dataset, benchmark and model.
arXiv Detail & Related papers (2025-04-18T12:09:15Z)
EyeDiff: text-to-image diffusion model improves rare eye disease diagnosis [7.884451100342276]
EyeDiff is a text-to-image model designed to generate multimodal ophthalmic images from natural language prompts. EyeDiff is trained on eight large-scale datasets and is adapted to ten multi-country external datasets.
arXiv Detail & Related papers (2024-11-15T07:30:53Z)
LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models [38.78576472811659]
Large vision-language models (LVLMs) have the potential to assist in understanding anatomical information, diagnosing eye diseases, and drafting interpretations and follow-up plans. We benchmarked 13 state-of-the-art LVLM representatives from closed-source, open-source, and medical domains. The results demonstrate a significant performance drop for LVLMs in ophthalmology compared to other domains.
arXiv Detail & Related papers (2024-10-02T14:57:58Z)
ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports. Based on this dataset, we focus on the challanging task of unsupervised pretraining. We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z)
EyeCLIP: A visual-language foundation model for multi-modal ophthalmic image analysis [20.318178211934985]
We propose EyeCLIP, a visual-language foundation model developed using over 2.77 million ophthalmology images with partial text data. EyeCLIP can be transferred to a wide range of downstream tasks involving ocular and systemic diseases.
arXiv Detail & Related papers (2024-09-10T17:00:19Z)
VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge [26.93106207758859]
We introduce VisionUnite, a novel vision-language foundation model for ophthalmology enhanced with clinical knowledge. VisionUnite has been pretrained on an extensive dataset comprising 1.24 million image-text pairs. Our experiments indicate that VisionUnite outperforms existing generative foundation models such as GPT-4V and Gemini Pro.
arXiv Detail & Related papers (2024-08-05T23:31:07Z)
EyeFound: A Multimodal Generalist Foundation Model for Ophthalmic Imaging [13.88319807760491]
We present EyeFound, a multimodal foundation model for ophthalmic images. It learns generalizable representations from unlabeled multimodal retinal images. It is trained on 2.78 million images from 227 hospitals across 11 ophthalmic modalities.
arXiv Detail & Related papers (2024-05-18T17:03:39Z)
LVM-Med: Learning Large-Scale Self-Supervised Vision Models for Medical Imaging via Second-order Graph Matching [59.01894976615714]
We introduce LVM-Med, the first family of deep networks trained on large-scale medical datasets. We have collected approximately 1.3 million medical images from 55 publicly available datasets. LVM-Med empirically outperforms a number of state-of-the-art supervised, self-supervised, and foundation models.
arXiv Detail & Related papers (2023-06-20T22:21:34Z)
Customizing General-Purpose Foundation Models for Medical Report Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks. We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z)
Artificial General Intelligence for Medical Imaging Analysis [92.3940918983821]
Large-scale Artificial General Intelligence (AGI) models have achieved unprecedented success in a variety of general domain tasks. These models face notable challenges arising from the medical field's inherent complexities and unique characteristics. This review aims to offer insights into the future implications of AGI in medical imaging, healthcare, and beyond.
arXiv Detail & Related papers (2023-06-08T18:04:13Z)
DRAC: Diabetic Retinopathy Analysis Challenge with Ultra-Wide Optical Coherence Tomography Angiography Images [51.27125547308154]
We organized a challenge named "DRAC - Diabetic Retinopathy Analysis Challenge" in conjunction with the 25th International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2022) The challenge consists of three tasks: segmentation of DR lesions, image quality assessment and DR grading. This paper presents a summary and analysis of the top-performing solutions and results for each task of the challenge.
arXiv Detail & Related papers (2023-04-05T12:04:55Z)
An Interpretable Multiple-Instance Approach for the Detection of referable Diabetic Retinopathy from Fundus Images [72.94446225783697]
We propose a machine learning system for the detection of referable Diabetic Retinopathy in fundus images. By extracting local information from image patches and combining it efficiently through an attention mechanism, our system is able to achieve high classification accuracy. We evaluate our approach on publicly available retinal image datasets, in which it exhibits near state-of-the-art performance.
arXiv Detail & Related papers (2021-03-02T13:14:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.