Consistency-preserving Visual Question Answering in Medical Imaging
- URL: http://arxiv.org/abs/2206.13296v1
- Date: Mon, 27 Jun 2022 13:38:50 GMT
- Title: Consistency-preserving Visual Question Answering in Medical Imaging
- Authors: Sergio Tascon-Morales, Pablo M\'arquez-Neila, Raphael Sznitman
- Abstract summary: Visual Question Answering (VQA) models take an image and a natural-language question as input and infer the answer to the question.
We propose a novel loss function and corresponding training procedure that allows the inclusion of relations between questions into the training process.
Our experiments show that our method outperforms state-of-the-art baselines.
- Score: 2.005299372367689
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Question Answering (VQA) models take an image and a natural-language
question as input and infer the answer to the question. Recently, VQA systems
in medical imaging have gained popularity thanks to potential advantages such
as patient engagement and second opinions for clinicians. While most research
efforts have been focused on improving architectures and overcoming
data-related limitations, answer consistency has been overlooked even though it
plays a critical role in establishing trustworthy models. In this work, we
propose a novel loss function and corresponding training procedure that allows
the inclusion of relations between questions into the training process.
Specifically, we consider the case where implications between perception and
reasoning questions are known a-priori. To show the benefits of our approach,
we evaluate it on the clinically relevant task of Diabetic Macular Edema (DME)
staging from fundus imaging. Our experiments show that our method outperforms
state-of-the-art baselines, not only by improving model consistency, but also
in terms of overall model accuracy. Our code and data are available at
https://github.com/sergiotasconmorales/consistency_vqa.
Related papers
- Which Client is Reliable?: A Reliable and Personalized Prompt-based Federated Learning for Medical Image Question Answering [51.26412822853409]
We present a novel personalized federated learning (pFL) method for medical visual question answering (VQA) models.
Our method introduces learnable prompts into a Transformer architecture to efficiently train it on diverse medical datasets without massive computational costs.
arXiv Detail & Related papers (2024-10-23T00:31:17Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Visual Question Answering in the Medical Domain [13.673890873313354]
We present a novel contrastive learning pretraining method to mitigate the problem of small datasets for the Med-VQA task.
Our proposed model obtained an accuracy of 60% on the VQA-Med 2019 test set, giving comparable results to other state-of-the-art Med-VQA models.
arXiv Detail & Related papers (2023-09-20T06:06:10Z) - A reinforcement learning approach for VQA validation: an application to
diabetic macular edema grading [2.368995563245609]
We focus on providing a richer and more appropriate validation approach for highly powerful Visual Question Answering (VQA) algorithms.
We propose an automatic adaptive questioning method, that aims to expose the reasoning behavior of a VQA algorithm.
Experiments show that such an agent has similar behavior to a clinician, whereby asking questions that are relevant to key clinical concepts.
arXiv Detail & Related papers (2023-07-19T10:31:35Z) - Masked Vision and Language Pre-training with Unimodal and Multimodal
Contrastive Losses for Medical Visual Question Answering [7.669872220702526]
We present a novel self-supervised approach that learns unimodal and multimodal feature representations of input images and text.
The proposed approach achieves state-of-the-art (SOTA) performance on three publicly available medical VQA datasets.
arXiv Detail & Related papers (2023-07-11T15:00:11Z) - Localized Questions in Medical Visual Question Answering [2.005299372367689]
Visual Question Answering (VQA) models aim to answer natural language questions about given images.
Existing medical VQA models typically focus on answering questions that refer to an entire image.
This paper proposes a novel approach for medical VQA that addresses this limitation by developing a model that can answer questions about image regions.
arXiv Detail & Related papers (2023-07-03T14:47:18Z) - PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [56.25766322554655]
Medical Visual Question Answering (MedVQA) presents a significant opportunity to enhance diagnostic accuracy and healthcare delivery.
We propose a generative-based model for medical visual understanding by aligning visual information from a pre-trained vision encoder with a large language model.
We train the proposed model on PMC-VQA and then fine-tune it on multiple public benchmarks, e.g., VQA-RAD, SLAKE, and Image-Clef 2019.
arXiv Detail & Related papers (2023-05-17T17:50:16Z) - Improving Visual Question Answering Models through Robustness Analysis
and In-Context Learning with a Chain of Basic Questions [70.70725223310401]
This work proposes a new method that utilizes semantically related questions, referred to as basic questions, acting as noise to evaluate the robustness of VQA models.
The experimental results demonstrate that the proposed evaluation method effectively analyzes the robustness of VQA models.
arXiv Detail & Related papers (2023-04-06T15:32:35Z) - Neuro-Symbolic Visual Reasoning: Disentangling "Visual" from "Reasoning" [49.76230210108583]
We propose a framework to isolate and evaluate the reasoning aspect of visual question answering (VQA) separately from its perception.
We also propose a novel top-down calibration technique that allows the model to answer reasoning questions even with imperfect perception.
On the challenging GQA dataset, this framework is used to perform in-depth, disentangled comparisons between well-known VQA models.
arXiv Detail & Related papers (2020-06-20T08:48:29Z) - A Question-Centric Model for Visual Question Answering in Medical
Imaging [3.619444603816032]
We present a novel Visual Question Answering approach that allows an image to be queried by means of a written question.
Experiments on a variety of medical and natural image datasets show that by fusing image and question features in a novel way, the proposed approach achieves an equal or higher accuracy compared to current methods.
arXiv Detail & Related papers (2020-03-02T10:16:16Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.