Related papers: Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images

Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images

URL: http://arxiv.org/abs/2302.14042v3
Date: Wed, 14 Jun 2023 07:33:16 GMT
Title: Knowledge-enhanced Visual-Language Pre-training on Chest Radiology Images
Authors: Xiaoman Zhang, Chaoyi Wu, Ya Zhang, Yanfeng Wang, Weidi Xie
Abstract summary: We propose Knowledge-enhanced Auto Diagnosis (KAD) to guide vision-supervised pre-training using paired chest X-rays and radiology reports. We evaluate KAD on four external X-ray datasets and demonstrate that its zero-shot performance is superior to that of fully-language models.
Score: 40.52487429030841
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While multi-modal foundation models pre-trained on large-scale data have been successful in natural language understanding and vision recognition, their use in medical domains is still limited due to the fine-grained nature of medical tasks and the high demand for domain knowledge. To address this challenge, we propose a novel approach called Knowledge-enhanced Auto Diagnosis (KAD) which leverages existing medical domain knowledge to guide vision-language pre-training using paired chest X-rays and radiology reports. We evaluate KAD on {four} external X-ray datasets and demonstrate that its zero-shot performance is not only comparable to that of fully-supervised models, but also superior to the average of three expert radiologists for three (out of five) pathologies with statistical significance. Moreover, when few-shot annotation is available, KAD outperforms all existing approaches in fine-tuning settings, demonstrating its potential for application in different clinical scenarios.

Related papers

MG-3D: Multi-Grained Knowledge-Enhanced 3D Medical Vision-Language Pre-training [7.968487067774351]
3D medical image analysis is pivotal in numerous clinical applications. Large-scale vision-language pre-training remains underexplored in 3D medical image analysis. We propose MG-3D, pre-trained on large-scale data (47.1K)
arXiv Detail & Related papers (2024-12-08T09:45:59Z)
Abnormality-Driven Representation Learning for Radiology Imaging [0.8321462983924758]
We introduce lesion-enhanced contrastive learning (LeCL), a novel approach to obtain visual representations driven by abnormalities in 2D axial slices across different locations of the CT scans. We evaluate our approach across three clinical tasks: tumor lesion location, lung disease detection, and patient staging, benchmarking against four state-of-the-art foundation models.
arXiv Detail & Related papers (2024-11-25T13:53:26Z)
The Era of Foundation Models in Medical Imaging is Approaching : A Scoping Review of the Clinical Value of Large-Scale Generative AI Applications in Radiology [0.0]
Social problems stemming from the shortage of radiologists are intensifying, and artificial intelligence is being highlighted as a potential solution. Recently emerging large-scale generative AI has expanded from large language models (LLMs) to multi-modal models. This scoping review systematically organizes existing literature on the clinical value of large-scale generative AI applications.
arXiv Detail & Related papers (2024-09-03T00:48:50Z)
Visual Prompt Engineering for Vision Language Models in Radiology [0.17183214167143138]
Contrastive Language-Image Pretraining (CLIP) offers a promising solution by enabling zero-shot classification through multimodal large-scale pretraining.<n>While CLIP effectively captures global image content, radiology requires a more localized focus on specific pathology regions to enhance both interpretability and diagnostic accuracy.<n>We explore the potential of incorporating visual cues into zero-shot classification, embedding visual markers, such as arrows, bounding boxes, and circles, directly into radiological images to guide model attention.
arXiv Detail & Related papers (2024-08-28T13:53:27Z)
D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions [8.50767187405446]
We propose D-Rax -- a domain-specific, conversational, radiologic assistance tool. We enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting. We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations.
arXiv Detail & Related papers (2024-07-02T18:43:10Z)
DeViDe: Faceted medical knowledge for improved medical vision-language pre-training [1.6567372257085946]
Vision-language pre-training for chest X-rays has made significant strides, primarily by utilizing paired radiographs and radiology reports. We propose DeViDe, a transformer-based method that leverages radiographic descriptions from the open web. DeViDe incorporates three key features for knowledge-augmented vision language alignment: First, a large-language model-based augmentation is employed to homogenise medical knowledge from diverse sources. In zero-shot settings, DeViDe performs comparably to fully supervised models on external datasets and achieves state-of-the-art results on three large-scale datasets.
arXiv Detail & Related papers (2024-04-04T17:40:06Z)
ChatRadio-Valuer: A Chat Large Language Model for Generalizable Radiology Report Generation Based on Multi-institution and Multi-system Data [115.0747462486285]
ChatRadio-Valuer is a tailored model for automatic radiology report generation that learns generalizable representations. The clinical dataset utilized in this study encompasses a remarkable total of textbf332,673 observations. ChatRadio-Valuer consistently outperforms state-of-the-art models, especially ChatGPT (GPT-3.5-Turbo) and GPT-4 et al.
arXiv Detail & Related papers (2023-10-08T17:23:17Z)
Act Like a Radiologist: Radiology Report Generation across Anatomical Regions [50.13206214694885]
X-RGen is a radiologist-minded report generation framework across six anatomical regions. In X-RGen, we seek to mimic the behaviour of human radiologists, breaking them down into four principal phases. We enhance the recognition capacity of the image encoder by analysing images and reports across various regions.
arXiv Detail & Related papers (2023-05-26T07:12:35Z)
MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training in Radiology [40.52487429030841]
We consider enhancing medical visual-language pre-training with domain-specific knowledge, by exploiting the paired image-text reports from the radiological daily practice. First, unlike existing works that directly process the raw reports, we adopt a novel triplet extraction module to extract the medical-related information. Second, we propose a novel triplet encoding module with entity translation by querying a knowledge base, to exploit the rich domain knowledge in medical field. Third, we propose to use a Transformer-based fusion model for spatially aligning the entity description with visual signals at the image patch level, enabling the ability for medical diagnosis
arXiv Detail & Related papers (2023-01-05T18:55:09Z)
Significantly improving zero-shot X-ray pathology classification via fine-tuning pre-trained image-text encoders [50.689585476660554]
We propose a new fine-tuning strategy that includes positive-pair loss relaxation and random sentence sampling. Our approach consistently improves overall zero-shot pathology classification across four chest X-ray datasets and three pre-trained models.
arXiv Detail & Related papers (2022-12-14T06:04:18Z)
Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation [55.00308939833555]
The PPKED includes three modules: Posterior Knowledge Explorer (PoKE), Prior Knowledge Explorer (PrKE) and Multi-domain Knowledge Distiller (MKD) PoKE explores the posterior knowledge, which provides explicit abnormal visual regions to alleviate visual data bias. PrKE explores the prior knowledge from the prior medical knowledge graph (medical knowledge) and prior radiology reports (working experience) to alleviate textual data bias.
arXiv Detail & Related papers (2021-06-13T11:10:02Z)
Variational Knowledge Distillation for Disease Classification in Chest X-Rays [102.04931207504173]
We propose itvariational knowledge distillation (VKD), which is a new probabilistic inference framework for disease classification based on X-rays. We demonstrate the effectiveness of our method on three public benchmark datasets with paired X-ray images and EHRs.
arXiv Detail & Related papers (2021-03-19T14:13:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.