Towards Generalist Foundation Model for Radiology by Leveraging
Web-scale 2D&3D Medical Data
- URL: http://arxiv.org/abs/2308.02463v5
- Date: Thu, 16 Nov 2023 12:38:46 GMT
- Title: Towards Generalist Foundation Model for Radiology by Leveraging
Web-scale 2D&3D Medical Data
- Authors: Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang and Weidi Xie
- Abstract summary: This study aims to initiate the development of Radiology Foundation Model, termed as RadFM.
To the best of our knowledge, this is the first large-scale, high-quality, medical visual-language dataset, with both 2D and 3D scans.
We propose a new evaluation benchmark, RadBench, that comprises five tasks, including modality recognition, disease diagnosis, visual question answering, report generation and rationale diagnosis.
- Score: 66.9359934608229
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this study, we aim to initiate the development of Radiology Foundation
Model, termed as RadFM. We consider the construction of foundational models
from three perspectives, namely, dataset construction, model design, and
thorough evaluation. Our contribution can be concluded as follows: (i), we
construct a large-scale Medical Multi-modal Dataset, MedMD, which consists of
16M 2D and 3D medical scans with high-quality text descriptions or reports
across various data formats, modalities, and tasks, covering over 5000 distinct
diseases. To the best of our knowledge, this is the first large-scale,
high-quality, medical visual-language dataset, with both 2D and 3D scans; (ii),
we propose an architecture that enables visually conditioned generative
pre-training, i.e., allowing for integration of text input with 2D or 3D
medical scans, and generate responses for diverse radiologic tasks. The model
was initially pre-trained on MedMD and subsequently fine-tuned on the
domain-specific dataset, which is a radiologic cleaned version of MedMD,
containing 3M radiologic visual-language pairs, termed as RadMD; (iii), we
propose a new evaluation benchmark, RadBench, that comprises five tasks,
including modality recognition, disease diagnosis, visual question answering,
report generation and rationale diagnosis, aiming to comprehensively assess the
capability of foundation models in handling practical clinical problems. We
conduct both automatic and human evaluation on RadBench, in both cases, RadFM
outperforms existing multi-modal foundation models, that are publicaly
accessible, including Openflamingo, MedFlamingo, MedVInT and GPT-4V.
Additionally, we also adapt RadFM for different public benchmarks, surpassing
existing SOTAs on diverse datasets. All codes, data, and model checkpoint will
all be made publicly available to promote further research and development in
the field.
Related papers
- 3D-CT-GPT: Generating 3D Radiology Reports through Integration of Large Vision-Language Models [51.855377054763345]
This paper introduces 3D-CT-GPT, a Visual Question Answering (VQA)-based medical visual language model for generating radiology reports from 3D CT scans.
Experiments on both public and private datasets demonstrate that 3D-CT-GPT significantly outperforms existing methods in terms of report accuracy and quality.
arXiv Detail & Related papers (2024-09-28T12:31:07Z) - Expert-level vision-language foundation model for real-world radiology and comprehensive evaluation [27.05259342502574]
We present RadFound, a vision-language foundation model tailored for radiology.
It is trained on the most extensive dataset of over 8.1 million images and 250,000 image-text pairs.
To establish expert-level multimodal perception and generation capabilities, RadFound introduces an enhanced vision encoder.
arXiv Detail & Related papers (2024-09-24T15:31:49Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis [56.57177181778517]
RadGenome-Chest CT is a large-scale, region-guided 3D chest CT interpretation dataset based on CT-RATE.
We leverage the latest powerful universal segmentation and large language models to extend the original datasets.
arXiv Detail & Related papers (2024-04-25T17:11:37Z) - M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models [49.5030774873328]
Previous research has primarily focused on 2D medical images, leaving 3D images under-explored, despite their richer spatial information.
We present a large-scale 3D multi-modal medical dataset, M3D-Data, comprising 120K image-text pairs and 662K instruction-response pairs.
We also introduce a new 3D multi-modal medical benchmark, M3D-Bench, which facilitates automatic evaluation across eight tasks.
arXiv Detail & Related papers (2024-03-31T06:55:12Z) - HyperFusion: A Hypernetwork Approach to Multimodal Integration of Tabular and Medical Imaging Data for Predictive Modeling [4.44283662576491]
We present a novel framework based on hypernetworks to fuse clinical imaging and tabular data by conditioning the image processing on the EHR's values and measurements.
We show that our framework outperforms both single-modality models and state-of-the-art MRI-tabular data fusion methods.
arXiv Detail & Related papers (2024-03-20T05:50:04Z) - ChatRadio-Valuer: A Chat Large Language Model for Generalizable
Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations.
The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations.
ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z) - Specialty-Oriented Generalist Medical AI for Chest CT Screening [14.31187762890342]
We propose the first-of-its-kind medical multimodal-multitask foundation model (M3FM) with application in lung cancer screening and related tasks.
M3FM consistently outperforms the state-of-the-art single-modal task-specific models.
As a specialty-oriented generalist medical AI model, M3FM paves the way for similar breakthroughs in other areas of medicine.
arXiv Detail & Related papers (2023-04-03T20:19:56Z) - medigan: A Python Library of Pretrained Generative Models for Enriched
Data Access in Medical Imaging [3.8568465270960264]
medigan is a one-stop shop for pretrained generative models implemented as an open-source framework-agnostic Python library.
It allows researchers and developers to create, increase, and domain-adapt their training data in just a few lines of code.
The library's scalability and design is demonstrated by its growing number of integrated and readily-usable pretrained generative models.
arXiv Detail & Related papers (2022-09-28T23:45:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.