MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in
Radiology
- URL: http://arxiv.org/abs/2301.02228v3
- Date: Mon, 3 Apr 2023 09:57:51 GMT
- Title: MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in
Radiology
- Authors: Chaoyi Wu, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, Weidi Xie
- Abstract summary: We consider enhancing medical visual-language pre-training with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.
First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information.
Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field.
Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis
- Score: 40.52487429030841
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we consider enhancing medical visual-language pre-training
(VLP) with domain-specific knowledge, by exploiting the paired image-text
reports from the radiological daily practice. In particular, we make the
following contributions: First, unlike existing works that directly process the
raw reports, we adopt a novel triplet extraction module to extract the
medical-related information, avoiding unnecessary complexity from language
grammar and enhancing the supervision signals; Second, we propose a novel
triplet encoding module with entity translation by querying a knowledge base,
to exploit the rich domain knowledge in medical field, and implicitly build
relationships between medical entities in the language embedding space; Third,
we propose to use a Transformer-based fusion model for spatially aligning the
entity description with visual signals at the image patch level, enabling the
ability for medical diagnosis; Fourth, we conduct thorough experiments to
validate the effectiveness of our architecture, and benchmark on numerous
public benchmarks, e.g., ChestX-ray14, RSNA Pneumonia, SIIM-ACR Pneumothorax,
COVIDx CXR-2, COVID Rural, and EdemaSeverity. In both zero-shot and fine-tuning
settings, our model has demonstrated strong performance compared with the
former methods on disease classification and grounding.
Related papers
- RadBARTsum: Domain Specific Adaption of Denoising Sequence-to-Sequence Models for Abstractive Radiology Report Summarization [1.8450534779202723]
This study proposes RadBARTsum, a domain-specific and facilitated adaptation of the BART model for abstractive radiology report summarization.
The approach involves two main steps: 1) re-training the BART model on a large corpus of radiology reports using a novel entity masking strategy to improve biomedical domain knowledge learning, and 2) fine-tuning the model for the summarization task using the Findings and Background sections to predict the Impression section.
arXiv Detail & Related papers (2024-06-05T08:43:11Z) - Grounded Knowledge-Enhanced Medical VLP for Chest X-Ray [12.239249676716247]
Medical vision-language pre-training has emerged as a promising approach for learning domain-general representations of medical image and text.
We propose a grounded knowledge-enhanced medical vision-language pre-training framework for chest X-ray.
Our results show the advantage of incorporating grounding mechanism to remove biases and improve the alignment between chest X-ray image and radiology report.
arXiv Detail & Related papers (2024-04-23T05:16:24Z) - MedRG: Medical Report Grounding with Multi-modal Large Language Model [42.04042642085121]
Medical Report Grounding (MedRG) is an end-to-end solution for utilizing a multi-modal Large Language Model to predict key phrase.
The experimental results validate the effectiveness of MedRG, surpassing the performance of the existing state-of-the-art medical phrase grounding methods.
arXiv Detail & Related papers (2024-04-10T07:41:35Z) - DeViDe: Faceted medical knowledge for improved medical vision-language pre-training [1.6567372257085946]
Vision-language pre-training for chest X-rays has made significant strides, primarily by utilizing paired radiographs and radiology reports.
We propose DeViDe, a transformer-based method that leverages radiographic descriptions from the open web.
DeViDe incorporates three key features for knowledge-augmented vision language alignment: First, a large-language model-based augmentation is employed to homogenise medical knowledge from diverse sources.
In zero-shot settings, DeViDe performs comparably to fully supervised models on external datasets and achieves state-of-the-art results on three large-scale datasets.
arXiv Detail & Related papers (2024-04-04T17:40:06Z) - Customizing General-Purpose Foundation Models for Medical Report
Generation [64.31265734687182]
The scarcity of labelled medical image-report pairs presents great challenges in the development of deep and large-scale neural networks.
We propose customizing off-the-shelf general-purpose large-scale pre-trained models, i.e., foundation models (FMs) in computer vision and natural language processing.
arXiv Detail & Related papers (2023-06-09T03:02:36Z) - Vision-Language Modelling For Radiological Imaging and Reports In The
Low Data Regime [70.04389979779195]
This paper explores training medical vision-language models (VLMs) where the visual and language inputs are embedded into a common space.
We explore several candidate methods to improve low-data performance, including adapting generic pre-trained models to novel image and text domains.
Using text-to-image retrieval as a benchmark, we evaluate the performance of these methods with variable sized training datasets of paired chest X-rays and radiological reports.
arXiv Detail & Related papers (2023-03-30T18:20:00Z) - Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records.
The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z) - Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [116.87918100031153]
We propose a Cross-modal clinical Graph Transformer (CGT) for ophthalmic report generation (ORG)
CGT injects clinical relation triples into the visual features as prior knowledge to drive the decoding procedure.
Experiments on the large-scale FFA-IR benchmark demonstrate that the proposed CGT is able to outperform previous benchmark methods.
arXiv Detail & Related papers (2022-06-04T13:16:30Z) - Radiology Report Generation with a Learned Knowledge Base and
Multi-modal Alignment [27.111857943935725]
We present an automatic, multi-modal approach for report generation from chest x-ray.
Our approach features two distinct modules: (i) Learned knowledge base and (ii) Multi-modal alignment.
With the aid of both modules, our approach clearly outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-12-30T10:43:56Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.