IQ-VQA: Intelligent Visual Question Answering
- URL: http://arxiv.org/abs/2007.04422v1
- Date: Wed, 8 Jul 2020 20:41:52 GMT
- Title: IQ-VQA: Intelligent Visual Question Answering
- Authors: Vatsal Goel, Mohit Chandak, Ashish Anand and Prithwijit Guha
- Abstract summary: We show that our framework improves consistency of VQA models by 15% on the rule-based dataset.
We also quantitatively show improvement in attention maps which highlights better multi-modal understanding of vision and language.
- Score: 3.09911862091928
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Even though there has been tremendous progress in the field of Visual
Question Answering, models today still tend to be inconsistent and brittle. To
this end, we propose a model-independent cyclic framework which increases
consistency and robustness of any VQA architecture. We train our models to
answer the original question, generate an implication based on the answer and
then also learn to answer the generated implication correctly. As a part of the
cyclic framework, we propose a novel implication generator which can generate
implied questions from any question-answer pair. As a baseline for future works
on consistency, we provide a new human annotated VQA-Implications dataset. The
dataset consists of ~30k questions containing implications of 3 types - Logical
Equivalence, Necessary Condition and Mutual Exclusion - made from the VQA v2.0
validation dataset. We show that our framework improves consistency of VQA
models by ~15% on the rule-based dataset, ~7% on VQA-Implications dataset and
robustness by ~2%, without degrading their performance. In addition, we also
quantitatively show improvement in attention maps which highlights better
multi-modal understanding of vision and language.
Related papers
- Diversity Enhanced Narrative Question Generation for Storybooks [4.043005183192124]
We introduce a multi-question generation model (mQG) capable of generating multiple, diverse, and answerable questions.
To validate the answerability of the generated questions, we employ a SQuAD2.0 fine-tuned question answering model.
mQG shows promising results across various evaluation metrics, among strong baselines.
arXiv Detail & Related papers (2023-10-25T08:10:04Z) - UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Toward Unsupervised Realistic Visual Question Answering [70.67698100148414]
We study the problem of realistic VQA (RVQA), where a model has to reject unanswerable questions (UQs) and answer answerable ones (AQs)
We first point out 2 drawbacks in current RVQA research, where (1) datasets contain too many unchallenging UQs and (2) a large number of annotated UQs are required for training.
We propose a new testing dataset, RGQA, which combines AQs from an existing VQA dataset with around 29K human-annotated UQs.
This combines pseudo UQs obtained by randomly pairing images and questions, with an
arXiv Detail & Related papers (2023-03-09T06:58:29Z) - BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution
Generalization of VQA Models [47.64219291655723]
We introduce a new test set for visual question answering (VQA) called BinaryVQA to push the limits of VQA models.
Our dataset includes 7,800 questions across 1,024 images and covers a wide variety of objects, topics, and concepts.
Around 63% of the questions have positive answers.
arXiv Detail & Related papers (2023-01-28T00:03:44Z) - Co-VQA : Answering by Interactive Sub Question Sequence [18.476819557695087]
This paper proposes a conversation-based VQA framework, which consists of three components: Questioner, Oracle, and Answerer.
To perform supervised learning for each model, we introduce a well-designed method to build a SQS for each question on VQA 2.0 and VQA-CP v2 datasets.
arXiv Detail & Related papers (2022-04-02T15:09:16Z) - Relation-Guided Pre-Training for Open-Domain Question Answering [67.86958978322188]
We propose a Relation-Guided Pre-Training (RGPT-QA) framework to solve complex open-domain questions.
We show that RGPT-QA achieves 2.2%, 2.4%, and 6.3% absolute improvement in Exact Match accuracy on Natural Questions, TriviaQA, and WebQuestions.
arXiv Detail & Related papers (2021-09-21T17:59:31Z) - Contrast and Classify: Training Robust VQA Models [60.80627814762071]
We propose a novel training paradigm (ConClaT) that optimize both cross-entropy and contrastive losses.
We find that optimizing both losses -- either alternately or jointly -- is key to effective training.
arXiv Detail & Related papers (2020-10-13T00:23:59Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.