Related papers: Grounded Knowledge-Enhanced Medical Vision-Language Pre-training for Chest X-Ray

Grounded Knowledge-Enhanced Medical Vision-Language Pre-training for Chest X-Ray

URL: http://arxiv.org/abs/2404.14750v2
Date: Mon, 17 Feb 2025 02:49:16 GMT
Title: Grounded Knowledge-Enhanced Medical Vision-Language Pre-training for Chest X-Ray
Authors: Qiao Deng, Zhongzhen Huang, Yunqi Wang, Zhichuan Wang, Zhao Wang, Xiaofan Zhang, Qi Dou, Yeung Yu Hui, Edward S. Hui,
Abstract summary: Medical vision-language pre-training has emerged as a promising approach for learning domain-general representations of medical image and text.<n>We propose a grounded knowledge-enhanced medical vision-language pre-training framework for chest X-ray.<n>Our results demonstrate the advantage of incorporating grounding mechanism to remove biases and improve the alignment between chest X-ray image and radiology report.
Score: 12.239249676716247
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Medical foundation models have the potential to revolutionize healthcare by providing robust and generalized representations of medical data. Medical vision-language pre-training has emerged as a promising approach for learning domain-general representations of medical image and text. Current algorithms that exploit global and local alignment between medical image and text could however be marred by redundant information in medical data. To address this issue, we propose a grounded knowledge-enhanced medical vision-language pre-training (GK-MVLP) framework for chest X-ray. In this framework, medical knowledge was grounded to the appropriate anatomical regions by using a transformer-based grounded knowledge-enhanced module for fine-grained alignment between textural features of medical knowledge and the corresponding anatomical region-level visual features. The performance of GK-MVLP was competitive with or exceeded the state of the art on downstream image understanding tasks (chest X-ray disease classification, disease localization), generative task (report generation), and vision-language understanding task (medical visual question-answering). Our results demonstrate the advantage of incorporating grounding mechanism to remove biases and improve the alignment between chest X-ray image and radiology report.

Related papers

Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis [53.809054774037214]
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports. It is the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations.
arXiv Detail & Related papers (2024-05-14T19:53:20Z)
MedRG: Medical Report Grounding with Multi-modal Large Language Model [42.04042642085121]
Medical Report Grounding (MedRG) is an end-to-end solution for utilizing a multi-modal Large Language Model to predict key phrase. The experimental results validate the effectiveness of MedRG, surpassing the performance of the existing state-of-the-art medical phrase grounding methods.
arXiv Detail & Related papers (2024-04-10T07:41:35Z)
Eye-gaze Guided Multi-modal Alignment for Medical Representation Learning [65.54680361074882]
Eye-gaze Guided Multi-modal Alignment (EGMA) framework harnesses eye-gaze data for better alignment of medical visual and textual features. We conduct downstream tasks of image classification and image-text retrieval on four medical datasets.
arXiv Detail & Related papers (2024-03-19T03:59:14Z)
MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning [48.97640824497327]
We propose a novel framework leveraging domain-specific medical knowledge as guiding signals to integrate language information into the visual domain through image-text contrastive learning. Our model includes global contrastive learning with our designed divergence encoder, local token-knowledge-patch alignment contrastive learning, and knowledge-guided category-level contrastive learning with expert knowledge. Notably, MLIP surpasses state-of-the-art methods even with limited annotated data, highlighting the potential of multimodal pre-training in advancing medical representation learning.
arXiv Detail & Related papers (2024-02-03T05:48:50Z)
XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model. It can analyze and answer open-ended questions about chest radiographs. We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z)
Local Contrastive Learning for Medical Image Recognition [0.0]
Local Region Contrastive Learning (LRCLR) is a flexible fine-tuning framework that adds layers for significant image region selection and cross-modality interaction. Our results on an external validation set of chest x-rays suggest that LRCLR identifies significant local image regions and provides meaningful interpretation against radiology text.
arXiv Detail & Related papers (2023-03-24T17:04:26Z)
Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment [35.56193044201645]
Medical phrase grounding aims to locate the most relevant region in a medical image, given a phrase query describing certain medical findings. In this paper, we propose MedRPG, an end-to-end approach for MPG. To enable MedRPG to locate nuanced medical findings with better region-phrase correspondences, we further propose Tri-attention Context contrastive alignment (TaCo)
arXiv Detail & Related papers (2023-03-14T03:57:16Z)
MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology [40.52487429030841]
We consider enhancing medical visual-language pre-training with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information. Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field. Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis
arXiv Detail & Related papers (2023-01-05T18:55:09Z)
Detailed Annotations of Chest X-Rays via CT Projection for Report Understanding [16.5295886999348]
In clinical radiology reports, doctors capture important information about the patient's health status. They convey their observations from raw medical imaging data about the inner structures of a patient. This explicit grasp on both the patient's anatomy and their appearance is missing in current medical image-processing systems.
arXiv Detail & Related papers (2022-10-07T09:21:48Z)
Medical Image Captioning via Generative Pretrained Transformers [57.308920993032274]
We combine two language models, the Show-Attend-Tell and the GPT-3, to generate comprehensive and descriptive radiology records. The proposed model is tested on two medical datasets, the Open-I, MIMIC-CXR, and the general-purpose MS-COCO.
arXiv Detail & Related papers (2022-09-28T10:27:10Z)
Representative Image Feature Extraction via Contrastive Learning Pretraining for Chest X-ray Report Generation [19.69560434388278]
The goal of medical report generation is to accurately capture and describe the image findings. Previous works pretrain their visual encoding neural networks with large datasets in different domains. We propose a framework that uses a contrastive learning approach to pretrain the visual encoder and requires no additional meta information.
arXiv Detail & Related papers (2022-09-04T12:07:19Z)
Generative Residual Attention Network for Disease Detection [51.60842580044539]
We present a novel approach for disease generation in X-rays using a conditional generative adversarial learning. We generate a corresponding radiology image in a target domain while preserving the identity of the patient. We then use the generated X-ray image in the target domain to augment our training to improve the detection performance.
arXiv Detail & Related papers (2021-10-25T14:15:57Z)
In-Line Image Transformations for Imbalanced, Multiclass Computer Vision Classification of Lung Chest X-Rays [91.3755431537592]
This study aims to leverage a body of literature in order to apply image transformations that would serve to balance the lack of COVID-19 LCXR data. Deep learning techniques such as convolutional neural networks (CNNs) are able to select features that distinguish between healthy and disease states. This study utilizes a simple CNN architecture for high-performance multiclass LCXR classification at 94 percent accuracy.
arXiv Detail & Related papers (2021-04-06T02:01:43Z)
Variational Knowledge Distillation for Disease Classification in Chest X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays. We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z)
Auxiliary Signal-Guided Knowledge Encoder-Decoder for Medical Report Generation [107.3538598876467]
We propose an Auxiliary Signal-Guided Knowledge-Decoder (ASGK) to mimic radiologists' working patterns. ASGK integrates internal visual feature fusion and external medical linguistic information to guide medical knowledge transfer and learning.
arXiv Detail & Related papers (2020-06-06T01:00:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.