XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models
- URL: http://arxiv.org/abs/2306.07971v1
- Date: Tue, 13 Jun 2023 17:59:59 GMT
- Title: XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models
- Authors: Omkar Thawkar, Abdelrahman Shaker, Sahal Shaji Mullappilly, Hisham
Cholakkal, Rao Muhammad Anwer, Salman Khan, Jorma Laaksonen, Fahad Shahbaz
Khan
- Abstract summary: We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
- Score: 60.437091462613544
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: The latest breakthroughs in large vision-language models, such as Bard and
GPT-4, have showcased extraordinary abilities in performing a wide range of
tasks. Such models are trained on massive datasets comprising billions of
public image-text pairs with diverse tasks. However, their performance on
task-specific domains, such as radiology, is still under-investigated and
potentially limited due to a lack of sophistication in understanding biomedical
images. On the other hand, conversational medical models have exhibited
remarkable success but have mainly focused on text-based analysis. In this
paper, we introduce XrayGPT, a novel conversational medical vision-language
model that can analyze and answer open-ended questions about chest radiographs.
Specifically, we align both medical visual encoder (MedClip) with a fine-tuned
large language model (Vicuna), using a simple linear transformation. This
alignment enables our model to possess exceptional visual conversation
abilities, grounded in a deep understanding of radiographs and medical domain
knowledge. To enhance the performance of LLMs in the medical context, we
generate ~217k interactive and high-quality summaries from free-text radiology
reports. These summaries serve to enhance the performance of LLMs through the
fine-tuning process. Our approach opens up new avenues the research for
advancing the automated analysis of chest radiographs. Our open-source demos,
models, and instruction sets are available at:
https://github.com/mbzuai-oryx/XrayGPT.
Related papers
- LLaVA-Ultra: Large Chinese Language and Vision Assistant for Ultrasound [7.941670191244354]
We propose a fine-grained adaptive VLM architecture for Chinese medical visual conversations through parameter-efficient tuning.
Specifically, we devise a fusion module with fine-grained vision encoders to achieve enhancement for subtle medical visual semantics.
For execution, we leverage a large-scale multimodal Chinese ultrasound dataset obtained from the hospital.
arXiv Detail & Related papers (2024-10-19T11:38:31Z) - ViKL: A Mammography Interpretation Framework via Multimodal Aggregation of Visual-knowledge-linguistic Features [54.37042005469384]
We announce MVKL, the first multimodal mammography dataset encompassing multi-view images, detailed manifestations and reports.
Based on this dataset, we focus on the challanging task of unsupervised pretraining.
We propose ViKL, a framework that synergizes Visual, Knowledge, and Linguistic features.
arXiv Detail & Related papers (2024-09-24T05:01:23Z) - D-Rax: Domain-specific Radiologic assistant leveraging multi-modal data and eXpert model predictions [8.50767187405446]
We propose D-Rax -- a domain-specific, conversational, radiologic assistance tool.
We enhance the conversational analysis of chest X-ray (CXR) images to support radiological reporting.
We observe statistically significant improvement in responses when evaluated for both open and close-ended conversations.
arXiv Detail & Related papers (2024-07-02T18:43:10Z) - Self-supervised vision-langage alignment of deep learning representations for bone X-rays analysis [53.809054774037214]
This paper proposes leveraging vision-language pretraining on bone X-rays paired with French reports.
It is the first study to integrate French reports to shape the embedding space devoted to bone X-Rays representations.
arXiv Detail & Related papers (2024-05-14T19:53:20Z) - Medical Vision-Language Pre-Training for Brain Abnormalities [96.1408455065347]
We show how to automatically collect medical image-text aligned data for pretraining from public resources such as PubMed.
In particular, we present a pipeline that streamlines the pre-training process by initially collecting a large brain image-text dataset.
We also investigate the unique challenge of mapping subfigures to subcaptions in the medical domain.
arXiv Detail & Related papers (2024-04-27T05:03:42Z) - DeViDe: Faceted medical knowledge for improved medical vision-language pre-training [1.6567372257085946]
Vision-language pre-training for chest X-rays has made significant strides, primarily by utilizing paired radiographs and radiology reports.
We propose DeViDe, a transformer-based method that leverages radiographic descriptions from the open web.
DeViDe incorporates three key features for knowledge-augmented vision language alignment: First, a large-language model-based augmentation is employed to homogenise medical knowledge from diverse sources.
In zero-shot settings, DeViDe performs comparably to fully supervised models on external datasets and achieves state-of-the-art results on three large-scale datasets.
arXiv Detail & Related papers (2024-04-04T17:40:06Z) - Enhancing Human-Computer Interaction in Chest X-ray Analysis using Vision and Language Model with Eye Gaze Patterns [7.6599164274971026]
Vision-Language Models (VLMs) enhanced with radiologists' attention by incorporating eye gaze data alongside textual prompts.
Heatmaps generated from eye gaze data, overlaying them onto medical images to highlight areas of intense radiologist's focus.
Results demonstrate the inclusion of eye gaze information significantly enhances the accuracy of chest X-ray analysis.
arXiv Detail & Related papers (2024-04-03T00:09:05Z) - MedXChat: A Unified Multimodal Large Language Model Framework towards CXRs Understanding and Generation [28.497591315598402]
Multimodal Large Language Models (MLLMs) have shown success in various general image processing tasks.
This study investigates the potential of MLLMs in improving the understanding and generation of Chest X-Rays (CXRs)
arXiv Detail & Related papers (2023-12-04T06:40:12Z) - Act Like a Radiologist: Radiology Report Generation across Anatomical Regions [50.13206214694885]
X-RGen is a radiologist-minded report generation framework across six anatomical regions.
In X-RGen, we seek to mimic the behaviour of human radiologists, breaking them down into four principal phases.
We enhance the recognition capacity of the image encoder by analysing images and reports across various regions.
arXiv Detail & Related papers (2023-05-26T07:12:35Z) - LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation [51.08810811457617]
vision-language alignment in LLMs is actively being researched to enable multimodal reasoning and visual IO.
We develop a method for instruction-tuning an LLM only on text to gain vision-language capabilities for medical images.
Our model, LLM-CXR, trained in this approach shows better image-text alignment in both CXR understanding and generation tasks.
arXiv Detail & Related papers (2023-05-19T07:44:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.