Expert Knowledge-Aware Image Difference Graph Representation Learning
for Difference-Aware Medical Visual Question Answering
- URL: http://arxiv.org/abs/2307.11986v1
- Date: Sat, 22 Jul 2023 05:34:18 GMT
- Title: Expert Knowledge-Aware Image Difference Graph Representation Learning
for Difference-Aware Medical Visual Question Answering
- Authors: Xinyue Hu, Lin Gu, Qiyuan An, Mengliang Zhang, Liangchen Liu, Kazuma
Kobayashi, Tatsuya Harada, Ronald M. Summers, Yingying Zhu
- Abstract summary: Given a pair of main and reference images, this task attempts to answer several questions on both diseases.
We collect a new dataset, namely MIMIC-Diff-VQA, including 700,703 QA pairs from 164,324 pairs of main and reference images.
- Score: 44.897116657726365
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To contribute to automating the medical vision-language model, we propose a
novel Chest-Xray Difference Visual Question Answering (VQA) task. Given a pair
of main and reference images, this task attempts to answer several questions on
both diseases and, more importantly, the differences between them. This is
consistent with the radiologist's diagnosis practice that compares the current
image with the reference before concluding the report. We collect a new
dataset, namely MIMIC-Diff-VQA, including 700,703 QA pairs from 164,324 pairs
of main and reference images. Compared to existing medical VQA datasets, our
questions are tailored to the Assessment-Diagnosis-Intervention-Evaluation
treatment procedure used by clinical professionals. Meanwhile, we also propose
a novel expert knowledge-aware graph representation learning model to address
this task. The proposed baseline model leverages expert knowledge such as
anatomical structure prior, semantic, and spatial knowledge to construct a
multi-relationship graph, representing the image differences between two images
for the image difference VQA task. The dataset and code can be found at
https://github.com/Holipori/MIMIC-Diff-VQA. We believe this work would further
push forward the medical vision language model.
Related papers
- Pretraining Vision-Language Model for Difference Visual Question Answering in Longitudinal Chest X-rays [6.351190845487287]
Difference visual question answering (diff-VQA) is a challenging task that requires answering complex questions based on differences between a pair of images.
Previous works focused on designing specific network architectures for the diff-VQA task, missing opportunities to enhance the model's performance.
Here, we introduce a novel VLM called PLURAL, which is pretrained on natural and longitudinal chest X-ray data for the diff-VQA task.
arXiv Detail & Related papers (2024-02-14T06:20:48Z) - XrayGPT: Chest Radiographs Summarization using Medical Vision-Language
Models [60.437091462613544]
We introduce XrayGPT, a novel conversational medical vision-language model.
It can analyze and answer open-ended questions about chest radiographs.
We generate 217k interactive and high-quality summaries from free-text radiology reports.
arXiv Detail & Related papers (2023-06-13T17:59:59Z) - PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [35.64805788623848]
We focus on the problem of Medical Visual Question Answering (MedVQA)
We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model.
arXiv Detail & Related papers (2023-05-17T17:50:16Z) - Pixel-Level Explanation of Multiple Instance Learning Models in
Biomedical Single Cell Images [52.527733226555206]
We investigate the use of four attribution methods to explain a multiple instance learning models.
We study two datasets of acute myeloid leukemia with over 100 000 single cell images.
We compare attribution maps with the annotations of a medical expert to see how the model's decision-making differs from the human standard.
arXiv Detail & Related papers (2023-03-15T14:00:11Z) - Medical visual question answering using joint self-supervised learning [8.817054025763325]
The encoder embeds across the image-text dual modalities with self-attention mechanism.
The decoder is connected to the top of the encoder and fine-tuned using the small-sized medical VQA dataset.
arXiv Detail & Related papers (2023-02-25T12:12:22Z) - Interpretable Medical Image Visual Question Answering via Multi-Modal
Relationship Graph Learning [45.746882253686856]
Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images.
We first collected a comprehensive and large-scale medical VQA dataset, focusing on chest X-ray images.
Based on this dataset, we also propose a novel baseline method by constructing three different relationship graphs.
arXiv Detail & Related papers (2023-02-19T17:46:16Z) - Learning to Exploit Temporal Structure for Biomedical Vision-Language
Processing [53.89917396428747]
Self-supervised learning in vision-language processing exploits semantic alignment between imaging and text modalities.
We explicitly account for prior images and reports when available during both training and fine-tuning.
Our approach, named BioViL-T, uses a CNN-Transformer hybrid multi-image encoder trained jointly with a text model.
arXiv Detail & Related papers (2023-01-11T16:35:33Z) - MuVAM: A Multi-View Attention-based Model for Medical Visual Question
Answering [2.413694065650786]
This paper proposes a multi-view attention-based model(MuVAM) for medical visual question answering.
It integrates the high-level semantics of medical images on the basis of text description.
Experiments on two datasets show that the effectiveness of MuVAM surpasses the state-of-the-art method.
arXiv Detail & Related papers (2021-07-07T13:40:25Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.