Localized Questions in Medical Visual Question Answering
- URL: http://arxiv.org/abs/2307.01067v1
- Date: Mon, 3 Jul 2023 14:47:18 GMT
- Title: Localized Questions in Medical Visual Question Answering
- Authors: Sergio Tascon-Morales and Pablo M\'arquez-Neila and Raphael Sznitman
- Abstract summary: Visual Question Answering (VQA) models aim to answer natural language questions about given images.
Existing medical VQA models typically focus on answering questions that refer to an entire image.
This paper proposes a novel approach for medical VQA that addresses this limitation by developing a model that can answer questions about image regions.
- Score: 2.005299372367689
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual Question Answering (VQA) models aim to answer natural language
questions about given images. Due to its ability to ask questions that differ
from those used when training the model, medical VQA has received substantial
attention in recent years. However, existing medical VQA models typically focus
on answering questions that refer to an entire image rather than where the
relevant content may be located in the image. Consequently, VQA models are
limited in their interpretability power and the possibility to probe the model
about specific image regions. This paper proposes a novel approach for medical
VQA that addresses this limitation by developing a model that can answer
questions about image regions while considering the context necessary to answer
the questions. Our experimental results demonstrate the effectiveness of our
proposed model, outperforming existing methods on three datasets. Our code and
data are available at https://github.com/sergiotasconmorales/locvqa.
Related papers
- UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery.
We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model.
We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z) - OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual
Question Answering in Vietnamese [2.7528170226206443]
We introduce the OpenViVQA dataset, the first large-scale dataset for visual question answering in Vietnamese.
The dataset consists of 11,000+ images associated with 37,000+ question-answer pairs (QAs)
Our proposed methods achieve results that are competitive with SOTA models such as SAAA, MCAN, LORA, and M4C.
arXiv Detail & Related papers (2023-05-07T03:59:31Z) - Interpretable Medical Image Visual Question Answering via Multi-Modal
Relationship Graph Learning [45.746882253686856]
Medical visual question answering (VQA) aims to answer clinically relevant questions regarding input medical images.
We first collected a comprehensive and large-scale medical VQA dataset, focusing on chest X-ray images.
Based on this dataset, we also propose a novel baseline method by constructing three different relationship graphs.
arXiv Detail & Related papers (2023-02-19T17:46:16Z) - Self-supervised vision-language pretraining for Medical visual question
answering [9.073820229958054]
We propose a self-supervised method that applies Masked image modeling, Masked language modeling, Image text matching and Image text alignment via contrastive learning (M2I2) for pretraining.
The proposed method achieves state-of-the-art performance on all the three public medical VQA datasets.
arXiv Detail & Related papers (2022-11-24T13:31:56Z) - Consistency-preserving Visual Question Answering in Medical Imaging [2.005299372367689]
Visual Question Answering (VQA) models take an image and a natural-language question as input and infer the answer to the question.
We propose a novel loss function and corresponding training procedure that allows the inclusion of relations between questions into the training process.
Our experiments show that our method outperforms state-of-the-art baselines.
arXiv Detail & Related papers (2022-06-27T13:38:50Z) - Medical Visual Question Answering: A Survey [55.53205317089564]
Medical Visual Question Answering(VQA) is a combination of medical artificial intelligence and popular VQA challenges.
Given a medical image and a clinically relevant question in natural language, the medical VQA system is expected to predict a plausible and convincing answer.
arXiv Detail & Related papers (2021-11-19T05:55:15Z) - Human-Adversarial Visual Question Answering [62.30715496829321]
We benchmark state-of-the-art VQA models against human-adversarial examples.
We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples.
arXiv Detail & Related papers (2021-06-04T06:25:32Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.