Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut
Learning in VQA
- URL: http://arxiv.org/abs/2210.04692v1
- Date: Mon, 10 Oct 2022 13:39:08 GMT
- Title: Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut
Learning in VQA
- Authors: Qingyi Si, Fandong Meng, Mingyu Zheng, Zheng Lin, Yuanxin Liu, Peng
Fu, Yanan Cao, Weiping Wang and Jie Zhou
- Abstract summary: VQA models are prone to learn the shortcut solution formed by dataset biases rather than the intended solution.
We propose a new dataset that considers varying types of shortcuts by constructing different distribution shifts in multiple OOD test sets.
Our benchmark provides a more rigorous and comprehensive testbed for shortcut learning in VQA.
- Score: 53.45074798673808
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Visual Question Answering (VQA) models are prone to learn the shortcut
solution formed by dataset biases rather than the intended solution. To
evaluate the VQA models' reasoning ability beyond shortcut learning, the VQA-CP
v2 dataset introduces a distribution shift between the training and test set
given a question type. In this way, the model cannot use the training set
shortcut (from question type to answer) to perform well on the test set.
However, VQA-CP v2 only considers one type of shortcut and thus still cannot
guarantee that the model relies on the intended solution rather than a solution
specific to this shortcut. To overcome this limitation, we propose a new
dataset that considers varying types of shortcuts by constructing different
distribution shifts in multiple OOD test sets. In addition, we overcome the
three troubling practices in the use of VQA-CP v2, e.g., selecting models using
OOD test sets, and further standardize OOD evaluation procedure. Our benchmark
provides a more rigorous and comprehensive testbed for shortcut learning in
VQA. We benchmark recent methods and find that methods specifically designed
for particular shortcuts fail to simultaneously generalize to our varying OOD
test sets. We also systematically study the varying shortcuts and provide
several valuable findings, which may promote the exploration of shortcut
learning in VQA.
Related papers
- Improving Selective Visual Question Answering by Learning from Your
Peers [74.20167944693424]
Visual Question Answering (VQA) models can have difficulties abstaining from answering when they are wrong.
We propose Learning from Your Peers (LYP) approach for training multimodal selection functions for making abstention decisions.
Our approach uses predictions from models trained on distinct subsets of the training data as targets for optimizing a Selective VQA model.
arXiv Detail & Related papers (2023-06-14T21:22:01Z) - Modularized Zero-shot VQA with Pre-trained Models [20.674979268279728]
We propose a modularized zero-shot network that explicitly decomposes questions into sub reasoning steps and is highly interpretable.
Our experiments on two VQA benchmarks under the zero-shot setting demonstrate the effectiveness of our method.
arXiv Detail & Related papers (2023-05-27T05:00:14Z) - Which Shortcut Solution Do Question Answering Models Prefer to Learn? [38.36299280464046]
Question answering (QA) models for reading comprehension tend to learn shortcut solutions rather than the solutions intended by QA datasets.
We show that shortcuts that exploit answer positions and word-label correlations are preferentially learned for extractive and multiple-choice QA.
We experimentally show that the learnability of shortcuts can be utilized to construct an effective QA training set.
arXiv Detail & Related papers (2022-11-29T13:57:59Z) - Counterfactual Samples Synthesizing and Training for Robust Visual
Question Answering [59.20766562530209]
VQA models still tend to capture superficial linguistic correlations in the training set.
Recent VQA works introduce an auxiliary question-only model to regularize the training of targeted VQA models.
We propose a novel model-agnostic Counterfactual Samples Synthesizing and Training (CSST) strategy.
arXiv Detail & Related papers (2021-10-03T14:31:46Z) - Why Machine Reading Comprehension Models Learn Shortcuts? [56.629192589376046]
We argue that larger proportion of shortcut questions in training data make models rely on shortcut tricks excessively.
A thorough empirical analysis shows that MRC models tend to learn shortcut questions earlier than challenging questions.
arXiv Detail & Related papers (2021-06-02T08:43:12Z) - Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in
Visual Question Answering [42.120558318437475]
Shortcut learning happens when a model exploits spurious statistical regularities to produce correct answers but does not deploy the desired behavior.
We introduce an evaluation methodology for visual question answering (VQA) to better diagnose cases of shortcut learning.
arXiv Detail & Related papers (2021-04-07T14:28:22Z) - Self-Supervised VQA: Answering Visual Questions using Images and
Captions [38.05223339919346]
VQA models assume the availability of datasets with human-annotated Image-Question-Answer(I-Q-A) triplets for training.
We study whether models can be trained without any human-annotated Q-A pairs, but only with images and associated text captions.
arXiv Detail & Related papers (2020-12-04T01:22:05Z) - Counterfactual Variable Control for Robust and Interpretable Question
Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases.
In this paper, we inspect such spurious "capability" of QA models using causal inference.
We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z) - On the Value of Out-of-Distribution Testing: An Example of Goodhart's
Law [78.10523907729642]
VQA-CP has become the standard OOD benchmark for visual question answering.
Most published methods rely on explicit knowledge of the construction of the OOD splits.
We show that embarrassingly-simple methods, including one that generates answers at random, surpass the state of the art on some question types.
arXiv Detail & Related papers (2020-05-19T06:45:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.