Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays
- URL: http://arxiv.org/abs/2402.08966v3
- Date: Fri, 13 Dec 2024 04:59:10 GMT
- Title: Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays
- Authors: Yeongjae Cho, Taehee Kim, Heejun Shin, Sungzoon Cho, Dongmyung Shin,
- Abstract summary: Difference visual question answering (diff-VQA) is a challenging task that requires answering complex questions based on differences between a pair of images.
Previous works focused on designing specific network architectures for the diff-VQA task, missing opportunities to enhance the model's performance.
Here, we introduce a novel VLM called PLURAL, which is pretrained on natural and longitudinal chest X-ray data for the diff-VQA task.
- Score: 6.351190845487287
- License:
- Abstract: Difference visual question answering (diff-VQA) is a challenging task that requires answering complex questions based on differences between a pair of images. This task is particularly important in reading chest X-ray images because radiologists often compare multiple images of the same patient taken at different times to track disease progression and changes in its severity in their clinical practice. However, previous works focused on designing specific network architectures for the diff-VQA task, missing opportunities to enhance the model's performance using a pretrained vision-language model (VLM). Here, we introduce a novel VLM called PLURAL, which is pretrained on natural and longitudinal chest X-ray data for the diff-VQA task. The model is developed using a step-by-step approach, starting with being pretrained on natural images and texts, followed by being trained using longitudinal chest X-ray data. The longitudinal data consist of pairs of X-ray images, along with question-answer sets and radiologist's reports that describe the changes in lung abnormalities and diseases over time. Our experimental results show that the PLURAL model outperforms state-of-the-art methods not only in diff-VQA for longitudinal X-rays but also in conventional VQA for a single X-ray image. Through extensive experiments, we demonstrate the effectiveness of the proposed VLM architecture and pretraining method in improving the model's performance.
Related papers
- MUSCLE: Multi-task Self-supervised Continual Learning to Pre-train Deep
Models for X-ray Images of Multiple Body Parts [63.30352394004674]
Multi-task Self-super-vised Continual Learning (MUSCLE) is a novel self-supervised pre-training pipeline for medical imaging tasks.
MUSCLE aggregates X-rays collected from multiple body parts for representation learning, and adopts a well-designed continual learning procedure.
We evaluate MUSCLE using 9 real-world X-ray datasets with various tasks, including pneumonia classification, skeletal abnormality classification, lung segmentation, and tuberculosis (TB) detection.
arXiv Detail & Related papers (2023-10-03T12:19:19Z) - Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering [45.058569118999436]
Given a pair of main and reference images, this task attempts to answer several questions on both diseases.
We collect a new dataset, namely MIMIC-Diff-VQA, including 700,703 QA pairs from 164,324 pairs of main and reference images.
arXiv Detail & Related papers (2023-07-22T05:34:18Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - Artificial Intelligence for Automatic Detection and Classification
Disease on the X-Ray Images [0.0]
This work presents rapid detection of diseases in the lung using the efficient Deep learning pre-trained RepVGG algorithm.
We are applying Artificial Intelligence technology for automatic highlighted detection of affected areas of people's lungs.
arXiv Detail & Related papers (2022-11-14T03:51:12Z) - Improving Chest X-Ray Classification by RNN-based Patient Monitoring [0.34998703934432673]
We analyze how information about diagnosis can improve CNN-based image classification models.
We show that a model trained on additional patient history information outperforms a model trained without the information by a significant margin.
arXiv Detail & Related papers (2022-10-28T11:47:15Z) - BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease
Diagnosis [42.917164607812886]
Vision-and-language(V&L) models take image and text as input and learn to capture the associations between them.
BERTHop is a transformer-based model based on PixelHop++ and VisualBERT, for better capturing the associations between the two modalities.
arXiv Detail & Related papers (2021-08-10T21:51:25Z) - Cross-Modal Contrastive Learning for Abnormality Classification and
Localization in Chest X-rays with Radiomics using a Feedback Loop [63.81818077092879]
We propose an end-to-end semi-supervised cross-modal contrastive learning framework for medical images.
We first apply an image encoder to classify the chest X-rays and to generate the image features.
The radiomic features are then passed through another dedicated encoder to act as the positive sample for the image features generated from the same chest X-ray.
arXiv Detail & Related papers (2021-04-11T09:16:29Z) - Cross Chest Graph for Disease Diagnosis with Structural Relational
Reasoning [2.7148274921314615]
Locating lesions is important in the computer-aided diagnosis of X-ray images.
General weakly-supervised methods have failed to consider the characteristics of X-ray images.
We propose the Cross-chest Graph (CCG), which improves the performance of automatic lesion detection.
arXiv Detail & Related papers (2021-01-22T08:24:04Z) - Convolutional-LSTM for Multi-Image to Single Output Medical Prediction [55.41644538483948]
A common scenario in developing countries is to have the volume metadata lost due multiple reasons.
It is possible to get a multi-image to single diagnostic model which mimics human doctor diagnostic process.
arXiv Detail & Related papers (2020-10-20T04:30:09Z) - Learning Invariant Feature Representation to Improve Generalization
across Chest X-ray Datasets [55.06983249986729]
We show that a deep learning model performing well when tested on the same dataset as training data starts to perform poorly when it is tested on a dataset from a different source.
By employing an adversarial training strategy, we show that a network can be forced to learn a source-invariant representation.
arXiv Detail & Related papers (2020-08-04T07:41:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.