Grounded Knowledge-Enhanced Medical Vision-Language Pre-training for Chest X-Ray
- URL: http://arxiv.org/abs/2404.14750v2
- Date: Mon, 17 Feb 2025 02:49:16 GMT
- Title: Grounded Knowledge-Enhanced Medical Vision-Language Pre-training for Chest X-Ray
- Authors: Qiao Deng, Zhongzhen Huang, Yunqi Wang, Zhichuan Wang, Zhao Wang, Xiaofan Zhang, Qi Dou, Yeung Yu Hui, Edward S. Hui,
- Abstract summary: Medical vision-language pre-training has emerged as a promising approach for learning domain-general representations of medical image and text.
We propose a grounded knowledge-enhanced medical vision-language pre-training framework for chest X-ray.
Our results demonstrate the advantage of incorporating grounding mechanism to remove biases and improve the alignment between chest X-ray image and radiology report.
- Score: 12.239249676716247
- License:
- Abstract: Medical foundation models have the potential to revolutionize healthcare by providing robust and generalized representations of medical data. Medical vision-language pre-training has emerged as a promising approach for learning domain-general representations of medical image and text. Current algorithms that exploit global and local alignment between medical image and text could however be marred by redundant information in medical data. To address this issue, we propose a grounded knowledge-enhanced medical vision-language pre-training (GK-MVLP) framework for chest X-ray. In this framework, medical knowledge was grounded to the appropriate anatomical regions by using a transformer-based grounded knowledge-enhanced module for fine-grained alignment between textural features of medical knowledge and the corresponding anatomical region-level visual features. The performance of GK-MVLP was competitive with or exceeded the state of the art on downstream image understanding tasks (chest X-ray disease classification, disease localization), generative task (report generation), and vision-language understanding task (medical visual question-answering). Our results demonstrate the advantage of incorporating grounding mechanism to remove biases and improve the alignment between chest X-ray image and radiology report.
Related papers
- Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis [53.809054774037214]
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports.
It is the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations.
arXiv Detail & Related papers (2024-05-14T19:53:20Z) - MedRG: Medical Report Grounding with Multi-modal Large Language Model [42.04042642085121]
Medical Report Grounding (MedRG) is an end-to-end solution for utilizing a multi-modal Large Language Model to predict key phrase.
The experimental results validate the effectiveness of MedRG, surpassing the performance of the existing state-of-the-art medical phrase grounding methods.
arXiv Detail & Related papers (2024-04-10T07:41:35Z) - Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features.
We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z) - MLIP: Enhancing Medical Visual Representation with Divergence Encoder
and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning.
Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge.
Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Medical Phrase Grounding with Region-Phrase Context Contrastive
Alignment [35.56193044201645]
Medical phrase grounding aims to locate the most relevant region in a medical image, given a phrase query describing certain medical findings.
In this paper, we propose MedRPG, an end-to-end approach for MPG.
To enable MedRPG to locate nuanced medical findings with better region-phrase correspondences, we further propose Tri-attention Context contrastive alignment (TaCo)
arXiv Detail & Related papers (2023-03-14T03:57:16Z) - MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in
Radiology [40.52487429030841]
We consider enhancing medical visual-language pre-training with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice.
First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information.
Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field.
Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis
arXiv Detail & Related papers (2023-01-05T18:55:09Z) - Detailed Annotations of Chest X-Rays via CT Projection for Report
Understanding [16.5295886999348]
In clinical radiology reports, doctors capture important information about the patient's health status.
They convey their observations from raw medical imaging data about the inner structures of a patient.
This explicit grasp on both the patient's anatomy and their appearance is missing in current medical image-processing systems.
arXiv Detail & Related papers (2022-10-07T09:21:48Z) - Representative Image Feature Extraction via Contrastive Learning
Pretraining for Chest X-ray Report Generation [19.69560434388278]
The goal of medical report generation is to accurately capture and describe the image findings.
Previous works pretrain their visual encoding neural networks with large datasets in different domains.
We propose a framework that uses a contrastive learning approach to pretrain the visual encoder and requires no additional meta information.
arXiv Detail & Related papers (2022-09-04T12:07:19Z) - Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning.
We generate a corresponding radiology image in a target domain while preserving the identity of the patient.
We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z) - Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report
Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns.
ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.