Related papers: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering

Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering

URL: http://arxiv.org/abs/2104.03149v1
Date: Wed, 7 Apr 2021 14:28:22 GMT
Title: Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Authors: Corentin Dancette, Remi Cadene, Damien Teney, Matthieu Cord
Abstract summary: Shortcut learning happens when a model exploits spurious statistical regularities to produce correct answers but does not deploy the desired behavior. We introduce an evaluation methodology for visual question answering (VQA) to better diagnose cases of shortcut learning.
Score: 42.120558318437475
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We introduce an evaluation methodology for visual question answering (VQA) to better diagnose cases of shortcut learning. These cases happen when a model exploits spurious statistical regularities to produce correct answers but does not actually deploy the desired behavior. There is a need to identify possible shortcuts in a dataset and assess their use before deploying a model in the real world. The research community in VQA has focused exclusively on question-based shortcuts, where a model might, for example, answer "What is the color of the sky" with "blue" by relying mostly on the question-conditional training prior and give little weight to visual evidence. We go a step further and consider multimodal shortcuts that involve both questions and images. We first identify potential shortcuts in the popular VQA v2 training set by mining trivial predictive rules such as co-occurrences of words and visual elements. We then create VQA-CE, a new evaluation set made of CounterExamples i.e. questions where the mined rules lead to incorrect answers. We use this new evaluation in a large-scale study of existing models. We demonstrate that even state-of-the-art models perform poorly and that existing techniques to reduce biases are largely ineffective in this context. Our findings suggest that past work on question-based biases in VQA has only addressed one facet of a complex issue. The code for our method is available at https://github.com/cdancette/detect-shortcuts

Related papers

Uncertainty Quantification in Retrieval Augmented Question Answering [57.05827081638329]
We propose to quantify the uncertainty of a QA model via estimating the utility of the passages it is provided with. We train a lightweight neural model to predict passage utility for a target QA model and show that while simple information theoretic metrics can predict answer correctness up to a certain extent, our approach efficiently approximates or outperforms more expensive sampling-based methods.
arXiv Detail & Related papers (2025-02-25T11:24:52Z)
Open-Set Knowledge-Based Visual Question Answering with Inference Paths [79.55742631375063]
The purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. We propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity) Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process.
arXiv Detail & Related papers (2023-10-12T09:12:50Z)
Language Prior Is Not the Only Shortcut: A Benchmark for Shortcut Learning in VQA [53.45074798673808]
VQA models are prone to learn the shortcut solution formed by dataset biases rather than the intended solution. We propose a new dataset that considers varying types of shortcuts by constructing different distribution shifts in multiple OOD test sets. Our benchmark provides a more rigorous and comprehensive testbed for shortcut learning in VQA.
arXiv Detail & Related papers (2022-10-10T13:39:08Z)
Discovering the Unknown Knowns: Turning Implicit Knowledge in the Dataset into Explicit Training Examples for Visual Question Answering [18.33311267792116]
We find that many of the "unknowns" to the learned VQA model are indeed "known" in the dataset implicitly. We present a simple data augmentation pipeline SimpleAug to turn this "known" knowledge into training examples for VQA.
arXiv Detail & Related papers (2021-09-13T16:56:43Z)
Zero-shot Visual Question Answering using Knowledge Graph [19.142028501513366]
We propose a Zero-shot VQA algorithm using knowledge graphs and a mask-based learning mechanism for better incorporating external knowledge. Experiments show that our method can achieve state-of-the-art performance in Zero-shot VQA with unseen answers.
arXiv Detail & Related papers (2021-07-12T12:17:18Z)
Human-Adversarial Visual Question Answering [62.30715496829321]
We benchmark state-of-the-art VQA models against human-adversarial examples. We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples.
arXiv Detail & Related papers (2021-06-04T06:25:32Z)
Why Machine Reading Comprehension Models Learn Shortcuts? [56.629192589376046]
We argue that larger proportion of shortcut questions in training data make models rely on shortcut tricks excessively. A thorough empirical analysis shows that MRC models tend to learn shortcut questions earlier than challenging questions.
arXiv Detail & Related papers (2021-06-02T08:43:12Z)
Self-Supervised VQA: Answering Visual Questions using Images and Captions [38.05223339919346]
VQA models assume the availability of datasets with human-annotated Image-Question-Answer(I-Q-A) triplets for training. We study whether models can be trained without any human-annotated Q-A pairs, but only with images and associated text captions.
arXiv Detail & Related papers (2020-12-04T01:22:05Z)
Counterfactual Variable Control for Robust and Interpretable Question Answering [57.25261576239862]
Deep neural network based question answering (QA) models are neither robust nor explainable in many cases. In this paper, we inspect such spurious "capability" of QA models using causal inference. We propose a novel approach called Counterfactual Variable Control (CVC) that explicitly mitigates any shortcut correlation.
arXiv Detail & Related papers (2020-10-12T10:09:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.