Robust Explanations for Visual Question Answering
- URL: http://arxiv.org/abs/2001.08730v1
- Date: Thu, 23 Jan 2020 18:43:34 GMT
- Title: Robust Explanations for Visual Question Answering
- Authors: Badri N. Patro, Shivansh Pate, and Vinay P. Namboodiri
- Abstract summary: We propose a method to obtain robust explanations for visual question answering(VQA) that correlate well with the answers.
Our model explains the answers obtained through a VQA model by providing visual and textual explanations.
We showcase the robustness of the model against a noise-based perturbation attack using corresponding visual and textual explanations.
- Score: 24.685231217726194
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In this paper, we propose a method to obtain robust explanations for visual
question answering(VQA) that correlate well with the answers. Our model
explains the answers obtained through a VQA model by providing visual and
textual explanations. The main challenges that we address are i) Answers and
textual explanations obtained by current methods are not well correlated and
ii) Current methods for visual explanation do not focus on the right location
for explaining the answer. We address both these challenges by using a
collaborative correlated module which ensures that even if we do not train for
noise based attacks, the enhanced correlation ensures that the right
explanation and answer can be generated. We further show that this also aids in
improving the generated visual and textual explanations. The use of the
correlated module can be thought of as a robust method to verify if the answer
and explanations are coherent. We evaluate this model using VQA-X dataset. We
observe that the proposed method yields better textual and visual justification
that supports the decision. We showcase the robustness of the model against a
noise-based perturbation attack using corresponding visual and textual
explanations. A detailed empirical analysis is shown. Here we provide source
code link for our model \url{https://github.com/DelTA-Lab-IITK/CCM-WACV}.
Related papers
- Integrating Large Language Models with Graph-based Reasoning for Conversational Question Answering [58.17090503446995]
We focus on a conversational question answering task which combines the challenges of understanding questions in context and reasoning over evidence gathered from heterogeneous sources like text, knowledge graphs, tables, and infoboxes.
Our method utilizes a graph structured representation to aggregate information about a question and its context.
arXiv Detail & Related papers (2024-06-14T13:28:03Z) - Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering [27.193336817953142]
We introduce an interpretable approach for graph-based Visual Question Answering (VQA)
Our model is designed to intrinsically produce a subgraph during the question-answering process as its explanation.
We compare these generated subgraphs against established post-hoc explainability methods for graph neural networks, and perform a human evaluation.
arXiv Detail & Related papers (2024-03-26T12:29:18Z) - Towards More Faithful Natural Language Explanation Using Multi-Level
Contrastive Learning in VQA [7.141288053123662]
Natural language explanation in visual question answer (VQA-NLE) aims to explain the decision-making process of models by generating natural language sentences to increase users' trust in the black-box systems.
Existing post-hoc explanations are not always aligned with human logical inference, suffering from the issues on: 1) Deductive unsatisfiability, the generated explanations do not logically lead to the answer; 2) Factual inconsistency, the model falsifies its counterfactual explanation for answers without considering the facts in images; and 3) Semantic perturbation insensitivity, the model can not recognize the semantic changes caused by small perturbations
arXiv Detail & Related papers (2023-12-21T05:51:55Z) - Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - Dynamic Clue Bottlenecks: Towards Interpretable-by-Design Visual Question Answering [58.64831511644917]
We introduce an interpretable by design model that factors model decisions into intermediate human-legible explanations.
We show that our inherently interpretable system can improve 4.64% over a comparable black-box system in reasoning-focused questions.
arXiv Detail & Related papers (2023-05-24T08:33:15Z) - Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
Class-imbalance View [129.392671317356]
We propose to interpret the language prior problem in VQA from a class-imbalance view.
It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer.
We also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.
arXiv Detail & Related papers (2020-10-30T00:57:17Z) - Cross-modal Knowledge Reasoning for Knowledge-based Visual Question
Answering [27.042604046441426]
Knowledge-based Visual Question Answering (KVQA) requires external knowledge beyond the visible content to answer questions about an image.
In this paper, we depict an image by multiple knowledge graphs from the visual, semantic and factual views.
We decompose the model into a series of memory-based reasoning steps, each performed by a G raph-based R ead, U pdate, and C ontrol.
We achieve a new state-of-the-art performance on three popular benchmark datasets, including FVQA, Visual7W-KB and OK-VQA.
arXiv Detail & Related papers (2020-08-31T23:25:01Z) - Robust Question Answering Through Sub-part Alignment [53.94003466761305]
We model question answering as an alignment problem.
We train our model on SQuAD v1.1 and test it on several adversarial and out-of-domain datasets.
arXiv Detail & Related papers (2020-04-30T09:10:57Z) - On the General Value of Evidence, and Bilingual Scene-Text Visual
Question Answering [120.64104995052189]
We present a dataset that takes a step towards addressing this problem in that it contains questions expressed in two languages.
Measuring reasoning directly encourages generalization by penalizing answers that are coincidentally correct.
The dataset reflects the scene-text version of the VQA problem, and the reasoning evaluation can be seen as a text-based version of a referring expression challenge.
arXiv Detail & Related papers (2020-02-24T13:02:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.