Hierarchical Deep Multi-modal Network for Medical Visual Question
Answering
- URL: http://arxiv.org/abs/2009.12770v1
- Date: Sun, 27 Sep 2020 07:24:41 GMT
- Title: Hierarchical Deep Multi-modal Network for Medical Visual Question
Answering
- Authors: Deepak Gupta, Swati Suman, Asif Ekbal
- Abstract summary: We propose a hierarchical deep multi-modal network that analyzes and classifies end-user questions/queries.
We integrate the QS model to the hierarchical deep multi-modal neural network to generate proper answers to the queries related to medical images.
- Score: 25.633660028022195
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Visual Question Answering in Medical domain (VQA-Med) plays an important role
in providing medical assistance to the end-users. These users are expected to
raise either a straightforward question with a Yes/No answer or a challenging
question that requires a detailed and descriptive answer. The existing
techniques in VQA-Med fail to distinguish between the different question types
sometimes complicates the simpler problems, or over-simplifies the complicated
ones. It is certainly true that for different question types, several distinct
systems can lead to confusion and discomfort for the end-users. To address this
issue, we propose a hierarchical deep multi-modal network that analyzes and
classifies end-user questions/queries and then incorporates a query-specific
approach for answer prediction. We refer our proposed approach as Hierarchical
Question Segregation based Visual Question Answering, in short HQS-VQA. Our
contributions are three-fold, viz. firstly, we propose a question segregation
(QS) technique for VQAMed; secondly, we integrate the QS model to the
hierarchical deep multi-modal neural network to generate proper answers to the
queries related to medical images; and thirdly, we study the impact of QS in
Medical-VQA by comparing the performance of the proposed model with QS and a
model without QS. We evaluate the performance of our proposed model on two
benchmark datasets, viz. RAD and CLEF18. Experimental results show that our
proposed HQS-VQA technique outperforms the baseline models with significant
margins. We also conduct a detailed quantitative and qualitative analysis of
the obtained results and discover potential causes of errors and their
solutions.
Related papers
- RealMedQA: A pilot biomedical question answering dataset containing realistic clinical questions [3.182594503527438]
We present RealMedQA, a dataset of realistic clinical questions generated by humans and an LLM.
We show that the LLM is more cost-efficient for generating "ideal" QA pairs.
arXiv Detail & Related papers (2024-08-16T09:32:43Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder [39.06513668037645]
We propose a new Transformer based framework for medical VQA (named as Q2ATransformer)
We introduce an additional Transformer decoder with a set of learnable candidate answer embeddings to query the existence of each answer class to a given image-question pair.
Our method achieves new state-of-the-art performance on two medical VQA benchmarks.
arXiv Detail & Related papers (2023-04-04T08:06:40Z) - Toward Unsupervised Realistic Visual Question Answering [70.67698100148414]
We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs)
We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training.
We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs.
This combines pseudo UQs obtained by randomly pairing images and questions, with an
arXiv Detail & Related papers (2023-03-09T06:58:29Z) - RoMQA: A Benchmark for Robust, Multi-evidence, Multi-answer Question
Answering [87.18962441714976]
We introduce RoMQA, the first benchmark for robust, multi-evidence, multi-answer question answering (QA)
We evaluate state-of-the-art large language models in zero-shot, few-shot, and fine-tuning settings, and find that RoMQA is challenging.
Our results show that RoMQA is a challenging benchmark for large language models, and provides a quantifiable test to build more robust QA methods.
arXiv Detail & Related papers (2022-10-25T21:39:36Z) - Co-VQA : Answering by Interactive Sub Question Sequence [18.476819557695087]
This paper proposes a conversation-based VQA framework, which consists of three components: Questioner, Oracle, and Answerer.
To perform supervised learning for each model, we introduce a well-designed method to build a SQS for each question on VQA 2.0 and VQA-CP v2 datasets.
arXiv Detail & Related papers (2022-04-02T15:09:16Z) - Logically Consistent Loss for Visual Question Answering [66.83963844316561]
The current advancement in neural-network based Visual Question Answering (VQA) cannot ensure such consistency due to identically distribution (i.i.d.) assumption.
We propose a new model-agnostic logic constraint to tackle this issue by formulating a logically consistent loss in the multi-task learning framework.
Experiments confirm that the proposed loss formulae and introduction of hybrid-batch leads to more consistency as well as better performance.
arXiv Detail & Related papers (2020-11-19T20:31:05Z) - Multiple interaction learning with question-type prior knowledge for
constraining answer search space in visual question answering [24.395733613284534]
We propose a novel VQA model that utilizes the question-type prior information to improve VQA.
The solid experiments on two benchmark datasets, i.e., VQA 2.0 and TDIUC, indicate that the proposed method yields the best performance with the most competitive approaches.
arXiv Detail & Related papers (2020-09-23T12:54:34Z) - Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex
Healthcare Question Answering [89.76059961309453]
HeadQA dataset contains multiple-choice questions authorized for the public healthcare specialization exam.
These questions are the most challenging for current QA systems.
We present a Multi-step reasoning with Knowledge extraction framework (MurKe)
We are striving to make full use of off-the-shelf pre-trained models.
arXiv Detail & Related papers (2020-08-06T02:47:46Z) - CQ-VQA: Visual Question Answering on Categorized Questions [3.0013352260516744]
This paper proposes CQ-VQA, a novel 2-level hierarchical but end-to-end model to solve the task of visual question answering (VQA)
The first level of CQ-VQA, referred to as question categorizer (QC), classifies questions to reduce the potential answer search space.
The second level, referred to as answer predictor (AP), comprises of a set of distinct classifiers corresponding to each question category.
arXiv Detail & Related papers (2020-02-17T06:45:29Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.