How Transferable are Reasoning Patterns in VQA?
- URL: http://arxiv.org/abs/2104.03656v1
- Date: Thu, 8 Apr 2021 10:18:45 GMT
- Title: How Transferable are Reasoning Patterns in VQA?
- Authors: Corentin Kervadec, Theo Jaunet, Grigory Antipov, Moez Baccouche,
Romain Vuillemot and Christian Wolf
- Abstract summary: We argue that uncertainty in vision is a dominating factor preventing the successful learning of reasoning in vision and language problems.
We train a visual oracle and in a large scale study provide experimental evidence that it is much less prone to exploiting spurious dataset biases.
We exploit these insights by transferring reasoning patterns from the oracle to a SOTA Transformer-based VQA model taking standard noisy visual inputs via fine-tuning.
- Score: 10.439369423744708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Since its inception, Visual Question Answering (VQA) is notoriously known as
a task, where models are prone to exploit biases in datasets to find shortcuts
instead of performing high-level reasoning. Classical methods address this by
removing biases from training data, or adding branches to models to detect and
remove biases. In this paper, we argue that uncertainty in vision is a
dominating factor preventing the successful learning of reasoning in vision and
language problems. We train a visual oracle and in a large scale study provide
experimental evidence that it is much less prone to exploiting spurious dataset
biases compared to standard models. We propose to study the attention
mechanisms at work in the visual oracle and compare them with a SOTA
Transformer-based model. We provide an in-depth analysis and visualizations of
reasoning patterns obtained with an online visualization tool which we make
publicly available (https://reasoningpatterns.github.io). We exploit these
insights by transferring reasoning patterns from the oracle to a SOTA
Transformer-based VQA model taking standard noisy visual inputs via
fine-tuning. In experiments we report higher overall accuracy, as well as
accuracy on infrequent answers for each question type, which provides evidence
for improved generalization and a decrease of the dependency on dataset biases.
Related papers
- Variation of Gender Biases in Visual Recognition Models Before and After
Finetuning [29.55318393877906]
We introduce a framework to measure how biases change before and after fine-tuning a large scale visual recognition model for a downstream task.
We find that supervised models trained on datasets such as ImageNet-21k are more likely to retain their pretraining biases.
We also find that models finetuned on larger scale datasets are more likely to introduce new biased associations.
arXiv Detail & Related papers (2023-03-14T03:42:47Z) - VisFIS: Visual Feature Importance Supervision with
Right-for-the-Right-Reason Objectives [84.48039784446166]
We show that model FI supervision can meaningfully improve VQA model accuracy as well as performance on several Right-for-the-Right-Reason metrics.
Our best performing method, Visual Feature Importance Supervision (VisFIS), outperforms strong baselines on benchmark VQA datasets.
Predictions are more accurate when explanations are plausible and faithful, and not when they are plausible but not faithful.
arXiv Detail & Related papers (2022-06-22T17:02:01Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Perceptual Score: What Data Modalities Does Your Model Perceive? [73.75255606437808]
We introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features.
We find that recent, more accurate multi-modal models for visual question-answering tend to perceive the visual data less than their predecessors.
Using the perceptual score also helps to analyze model biases by decomposing the score into data subset contributions.
arXiv Detail & Related papers (2021-10-27T12:19:56Z) - Greedy Gradient Ensemble for Robust Visual Question Answering [163.65789778416172]
We stress the language bias in Visual Question Answering (VQA) that comes from two aspects, i.e., distribution bias and shortcut bias.
We propose a new de-bias framework, Greedy Gradient Ensemble (GGE), which combines multiple biased models for unbiased base model learning.
GGE forces the biased models to over-fit the biased data distribution in priority, thus makes the base model pay more attention to examples that are hard to solve by biased models.
arXiv Detail & Related papers (2021-07-27T08:02:49Z) - Supervising the Transfer of Reasoning Patterns in VQA [9.834885796317971]
Methods for Visual Question Anwering (VQA) are notorious for leveraging dataset biases rather than performing reasoning.
We propose a method for knowledge transfer based on a regularization term in our loss function, supervising the sequence of required reasoning operations.
We also demonstrate the effectiveness of this approach experimentally on the GQA dataset and show its complement to BERT-like self-supervised pre-training.
arXiv Detail & Related papers (2021-06-10T08:58:43Z) - Evading the Simplicity Bias: Training a Diverse Set of Models Discovers
Solutions with Superior OOD Generalization [93.8373619657239]
Neural networks trained with SGD were recently shown to rely preferentially on linearly-predictive features.
This simplicity bias can explain their lack of robustness out of distribution (OOD)
We demonstrate that the simplicity bias can be mitigated and OOD generalization improved.
arXiv Detail & Related papers (2021-05-12T12:12:24Z) - VisQA: X-raying Vision and Language Reasoning in Transformers [10.439369423744708]
Recent research has shown that state-of-the-art models tend to produce answers exploiting biases and shortcuts in the training data.
We present VisQA, a visual analytics tool that explores this question of reasoning vs. bias exploitation.
arXiv Detail & Related papers (2021-04-02T08:08:25Z) - Learning from others' mistakes: Avoiding dataset biases without modeling
them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task.
Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available.
We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z) - Roses Are Red, Violets Are Blue... but Should Vqa Expect Them To? [0.0]
We argue that the standard evaluation metric, which consists in measuring the overall in-domain accuracy, is misleading.
We propose the GQA-OOD benchmark designed to overcome these concerns.
arXiv Detail & Related papers (2020-06-09T08:50:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.