Multi-modal Pre-training for Medical Vision-language Understanding and
Generation: An Empirical Study with A New Benchmark
- URL: http://arxiv.org/abs/2306.06494v2
- Date: Thu, 24 Aug 2023 07:52:59 GMT
- Title: Multi-modal Pre-training for Medical Vision-language Understanding and
Generation: An Empirical Study with A New Benchmark
- Authors: Li Xu, Bo Liu, Ameer Hamza Khan, Lu Fan, Xiao-Ming Wu
- Abstract summary: We propose RadioGraphy Captions (RGC), a high-quality, multi-modality radiographic dataset containing 18,434 image-caption pairs.
RGC can be used as a pre-training dataset or a new benchmark for medical report generation and medical image-text retrieval.
- Score: 12.565598914787834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the availability of large-scale, comprehensive, and general-purpose
vision-language (VL) datasets such as MSCOCO, vision-language pre-training
(VLP) has become an active area of research and proven to be effective for
various VL tasks such as visual-question answering. However, studies on VLP in
the medical domain have so far been scanty. To provide a comprehensive
perspective on VLP for medical VL tasks, we conduct a thorough experimental
analysis to study key factors that may affect the performance of VLP with a
unified vision-language Transformer. To allow making sound and quick
pre-training decisions, we propose RadioGraphy Captions (RGC), a high-quality,
multi-modality radiographic dataset containing 18,434 image-caption pairs
collected from an open-access online database MedPix. RGC can be used as a
pre-training dataset or a new benchmark for medical report generation and
medical image-text retrieval. By utilizing RGC and other available datasets for
pre-training, we develop several key insights that can guide future medical VLP
research and new strong baselines for various medical VL tasks.
Related papers
Err
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.