Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in
Visual Question Answering
- URL: http://arxiv.org/abs/2104.03149v1
- Date: Wed, 7 Apr 2021 14:28:22 GMT
- Title: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in
Visual Question Answering
- Authors: Corentin Dancette, Remi Cadene, Damien Teney, Matthieu Cord
- Abstract summary: Shortcut learning happens when a model exploits spurious statistical regularities to produce correct answers but does not deploy the desired behavior.
We introduce an evaluation methodology for visual question answering (VQA) to better diagnose cases of shortcut learning.
- Score: 42.120558318437475
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce an evaluation methodology for visual question answering (VQA) to
better diagnose cases of shortcut learning. These cases happen when a model
exploits spurious statistical regularities to produce correct answers but does
not actually deploy the desired behavior. There is a need to identify possible
shortcuts in a dataset and assess their use before deploying a model in the
real world. The research community in VQA has focused exclusively on
question-based shortcuts, where a model might, for example, answer "What is the
color of the sky" with "blue" by relying mostly on the question-conditional
training prior and give little weight to visual evidence. We go a step further
and consider multimodal shortcuts that involve both questions and images. We
first identify potential shortcuts in the popular VQA v2 training set by mining
trivial predictive rules such as co-occurrences of words and visual elements.
We then create VQA-CE, a new evaluation set made of CounterExamples i.e.
questions where the mined rules lead to incorrect answers. We use this new
evaluation in a large-scale study of existing models. We demonstrate that even
state-of-the-art models perform poorly and that existing techniques to reduce
biases are largely ineffective in this context. Our findings suggest that past
work on question-based biases in VQA has only addressed one facet of a complex
issue. The code for our method is available at
https://github.com/cdancette/detect-shortcuts
Related papers
- Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases.
We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity)
Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z) - Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut
Learning in VQA [53.45074798673808]
VQA models are prone to learn the shortcut solution formed by dataset biases rather than the intended solution.
We propose a new dataset that considers varying types of shortcuts by constructing different distribution shifts in multiple OOD test sets.
Our benchmark provides a more rigorous and comprehensive testbed for shortcut learning in VQA.
arXiv Detail & Related papers (2022-10-10T13:39:08Z) - Discovering the Unknown Knowns: Turning Implicit Knowledge in the
Dataset into Explicit Training Examples for Visual Question Answering [18.33311267792116]
We find that many of the "unknowns" to the learned VQA model are indeed "known" in the dataset implicitly.
We present a simple data augmentation pipeline SimpleAug to turn this "known" knowledge into training examples for VQA.
arXiv Detail & Related papers (2021-09-13T16:56:43Z) - Zero-shot Visual Question Answering using Knowledge Graph [19.142028501513366]
We propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge.
Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers.
arXiv Detail & Related papers (2021-07-12T12:17:18Z) - Human-Adversarial Visual Question Answering [62.30715496829321]
We benchmark state-of-the-art VQA models against human-adversarial examples.
We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples.
arXiv Detail & Related papers (2021-06-04T06:25:32Z) - Why Machine Reading Comprehension Models Learn Shortcuts? [56.629192589376046]
We argue that larger proportion of shortcut questions in training data make models rely on shortcut tricks excessively.
A thorough empirical analysis shows that MRC models tend to learn shortcut questions earlier than challenging questions.
arXiv Detail & Related papers (2021-06-02T08:43:12Z) - Self-Supervised VQA: Answering Visual Questions using Images and
Captions [38.05223339919346]
VQA models assume the availability of datasets with human-annotated Image-Question-Answer(I-Q-A) triplets for training.
We study whether models can be trained without any human-annotated Q-A pairs, but only with images and associated text captions.
arXiv Detail & Related papers (2020-12-04T01:22:05Z) - Counterfactual Variable Control for Robust and Interpretable Question
Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases.
In this paper, we inspect such spurious "capability" of QA models using causal inference.
We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.