Logical Implications for Visual Question Answering Consistency
- URL: http://arxiv.org/abs/2303.09427v1
- Date: Thu, 16 Mar 2023 16:00:18 GMT
- Title: Logical Implications for Visual Question Answering Consistency
- Authors: Sergio Tascon-Morales and Pablo M\'arquez-Neila and Raphael Sznitman
- Abstract summary: We introduce a new consistency loss term that can be used by a wide range of the VQA models.
We propose to infer these logical relations using a dedicated language model and use these in our proposed consistency loss function.
We conduct extensive experiments on the VQA Introspect and DME datasets and show that our method brings improvements to state-of-the-art VQA models.
- Score: 2.005299372367689
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite considerable recent progress in Visual Question Answering (VQA)
models, inconsistent or contradictory answers continue to cast doubt on their
true reasoning capabilities. However, most proposed methods use indirect
strategies or strong assumptions on pairs of questions and answers to enforce
model consistency. Instead, we propose a novel strategy intended to improve
model performance by directly reducing logical inconsistencies. To do this, we
introduce a new consistency loss term that can be used by a wide range of the
VQA models and which relies on knowing the logical relation between pairs of
questions and answers. While such information is typically not available in VQA
datasets, we propose to infer these logical relations using a dedicated
language model and use these in our proposed consistency loss function. We
conduct extensive experiments on the VQA Introspect and DME datasets and show
that our method brings improvements to state-of-the-art VQA models, while being
robust across different architectures and settings.
Related papers
- An Empirical Comparison of LM-based Question and Answer Generation
Methods [79.31199020420827]
Question and answer generation (QAG) consists of generating a set of question-answer pairs given a context.
In this paper, we establish baselines with three different QAG methodologies that leverage sequence-to-sequence language model (LM) fine-tuning.
Experiments show that an end-to-end QAG model, which is computationally light at both training and inference times, is generally robust and outperforms other more convoluted approaches.
arXiv Detail & Related papers (2023-05-26T14:59:53Z) - Knowledge-Based Counterfactual Queries for Visual Question Answering [0.0]
We propose a systematic method for explaining the behavior and investigating the robustness of VQA models through counterfactual perturbations.
For this reason, we exploit structured knowledge bases to perform deterministic, optimal and controllable word-level replacements targeting the linguistic modality.
We then evaluate the model's response against such counterfactual inputs.
arXiv Detail & Related papers (2023-03-05T08:00:30Z) - On the Significance of Question Encoder Sequence Model in the
Out-of-Distribution Performance in Visual Question Answering [15.787663289343948]
Generalizing beyond the experiences has a significant role in developing practical AI systems.
Current Visual Question Answering (VQA) models are over-dependent on the language-priors.
This paper shows that the sequence model architecture used in the question-encoder has a significant role in the generalizability of VQA models.
arXiv Detail & Related papers (2021-08-28T05:51:27Z) - Learning from Lexical Perturbations for Consistent Visual Question
Answering [78.21912474223926]
Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations.
We propose a novel approach to address this issue based on modular networks, which creates two questions related by linguistic perturbations.
We also present VQA Perturbed Pairings (VQA P2), a new, low-cost benchmark and augmentation pipeline to create controllable linguistic variations.
arXiv Detail & Related papers (2020-11-26T17:38:03Z) - Logically Consistent Loss for Visual Question Answering [66.83963844316561]
The current advancement in neural-network based Visual Question Answering (VQA) cannot ensure such consistency due to identically distribution (i.i.d.) assumption.
We propose a new model-agnostic logic constraint to tackle this issue by formulating a logically consistent loss in the multi-task learning framework.
Experiments confirm that the proposed loss formulae and introduction of hybrid-batch leads to more consistency as well as better performance.
arXiv Detail & Related papers (2020-11-19T20:31:05Z) - Contrast and Classify: Training Robust VQA Models [60.80627814762071]
We propose a novel training paradigm (ConClaT) that optimize both cross-entropy and contrastive losses.
We find that optimizing both losses -- either alternately or jointly -- is key to effective training.
arXiv Detail & Related papers (2020-10-13T00:23:59Z) - Counterfactual Variable Control for Robust and Interpretable Question
Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases.
In this paper, we inspect such spurious "capability" of QA models using causal inference.
We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z) - IQ-VQA: Intelligent Visual Question Answering [3.09911862091928]
We show that our framework improves consistency of VQA models by 15% on the rule-based dataset.
We also quantitatively show improvement in attention maps which highlights better multi-modal understanding of vision and language.
arXiv Detail & Related papers (2020-07-08T20:41:52Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.