Interpretable Medical Image Visual Question Answering via Multi-Modal
Relationship Graph Learning
- URL: http://arxiv.org/abs/2302.09636v1
- Date: Sun, 19 Feb 2023 17:46:16 GMT
- Title: Interpretable Medical Image Visual Question Answering via Multi-Modal
Relationship Graph Learning
- Authors: Xinyue Hu, Lin Gu, Kazuma Kobayashi, Qiyuan An, Qingyu Chen, Zhiyong
Lu, Chang Su, Tatsuya Harada, Yingying Zhu
- Abstract summary: Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images.
We first collected a comprehensive and large-scale medical VQA dataset, focusing on chest X-ray images.
Based on this dataset, we also propose a novel baseline method by constructing three different relationship graphs.
- Score: 45.746882253686856
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Medical visual question answering (VQA) aims to answer clinically relevant
questions regarding input medical images. This technique has the potential to
improve the efficiency of medical professionals while relieving the burden on
the public health system, particularly in resource-poor countries. Existing
medical VQA methods tend to encode medical images and learn the correspondence
between visual features and questions without exploiting the spatial, semantic,
or medical knowledge behind them. This is partially because of the small size
of the current medical VQA dataset, which often includes simple questions.
Therefore, we first collected a comprehensive and large-scale medical VQA
dataset, focusing on chest X-ray images. The questions involved detailed
relationships, such as disease names, locations, levels, and types in our
dataset. Based on this dataset, we also propose a novel baseline method by
constructing three different relationship graphs: spatial relationship,
semantic relationship, and implicit relationship graphs on the image regions,
questions, and semantic labels. The answer and graph reasoning paths are
learned for different questions.
Related papers
- Ask Questions with Double Hints: Visual Question Generation with Answer-awareness and Region-reference [107.53380946417003]
We propose a novel learning paradigm to generate visual questions with answer-awareness and region-reference.
We develop a simple methodology to self-learn the visual hints without introducing any additional human annotations.
arXiv Detail & Related papers (2024-07-06T15:07:32Z) - Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering [45.058569118999436]
Given a pair of main and reference images, this task attempts to answer several questions on both diseases.
We collect a new dataset, namely MIMIC-Diff-VQA, including 700,703 QA pairs from 164,324 pairs of main and reference images.
arXiv Detail & Related papers (2023-07-22T05:34:18Z) - Localized Questions in Medical Visual Question Answering [2.005299372367689]
Visual Question Answering (VQA) models aim to answer natural language questions about given images.
Existing medical VQA models typically focus on answering questions that refer to an entire image.
This paper proposes a novel approach for medical VQA that addresses this limitation by developing a model that can answer questions about image regions.
arXiv Detail & Related papers (2023-07-03T14:47:18Z) - PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery.
We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model.
We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z) - Medical visual question answering using joint self-supervised learning [8.817054025763325]
The encoder embeds across the image-text dual modalities with self-attention mechanism.
The decoder is connected to the top of the encoder and fine-tuned using the small-sized medical VQA dataset.
arXiv Detail & Related papers (2023-02-25T12:12:22Z) - Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in
Visual Question Answering [71.6781118080461]
We propose a Graph Matching Attention (GMA) network for Visual Question Answering (VQA) task.
firstly, it builds graph for the image, but also constructs graph for the question in terms of both syntactic and embedding information.
Next, we explore the intra-modality relationships by a dual-stage graph encoder and then present a bilateral cross-modality graph matching attention to infer the relationships between the image and the question.
Experiments demonstrate that our network achieves state-of-the-art performance on the GQA dataset and the VQA 2.0 dataset.
arXiv Detail & Related papers (2021-12-14T10:01:26Z) - Medical Visual Question Answering: A Survey [55.53205317089564]
Medical Visual Question Answering(VQA) is a combination of medical artificial intelligence and popular VQA challenges.
Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer.
arXiv Detail & Related papers (2021-11-19T05:55:15Z) - MuVAM: A Multi-View Attention-based Model for Medical Visual Question
Answering [2.413694065650786]
This paper proposes a multi-view attention-based model(MuVAM) for medical visual question answering.
It integrates the high-level semantics of medical images on the basis of text description.
Experiments on two datasets show that the effectiveness of MuVAM surpasses the state-of-the-art method.
arXiv Detail & Related papers (2021-07-07T13:40:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.