MUTANT: A Training Paradigm for Out-of-Distribution Generalization in
Visual Question Answering
- URL: http://arxiv.org/abs/2009.08566v2
- Date: Fri, 16 Oct 2020 01:53:08 GMT
- Title: MUTANT: A Training Paradigm for Out-of-Distribution Generalization in
Visual Question Answering
- Authors: Tejas Gokhale and Pratyay Banerjee and Chitta Baral and Yezhou Yang
- Abstract summary: We present MUTANT, a training paradigm that exposes the model to perceptually similar, yet semantically distinct mutations of the input.
MUTANT establishes a new state-of-the-art accuracy on VQA-CP with a $10.57%$ improvement.
- Score: 58.30291671877342
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While progress has been made on the visual question answering leaderboards,
models often utilize spurious correlations and priors in datasets under the
i.i.d. setting. As such, evaluation on out-of-distribution (OOD) test samples
has emerged as a proxy for generalization. In this paper, we present MUTANT, a
training paradigm that exposes the model to perceptually similar, yet
semantically distinct mutations of the input, to improve OOD generalization,
such as the VQA-CP challenge. Under this paradigm, models utilize a
consistency-constrained training objective to understand the effect of semantic
changes in input (question-image pair) on the output (answer). Unlike existing
methods on VQA-CP, MUTANT does not rely on the knowledge about the nature of
train and test answer distributions. MUTANT establishes a new state-of-the-art
accuracy on VQA-CP with a $10.57\%$ improvement. Our work opens up avenues for
the use of semantic input mutations for OOD generalization in question
answering.
Related papers
- CRoFT: Robust Fine-Tuning with Concurrent Optimization for OOD Generalization and Open-Set OOD Detection [42.33618249731874]
We show that minimizing the magnitude of energy scores on training data leads to domain-consistent Hessians of classification loss.
We have developed a unified fine-tuning framework that allows for concurrent optimization of both tasks.
arXiv Detail & Related papers (2024-05-26T03:28:59Z) - Towards Robust Visual Question Answering: Making the Most of Biased
Samples via Contrastive Learning [54.61762276179205]
We propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples.
Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples.
We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.
arXiv Detail & Related papers (2022-10-10T11:05:21Z) - Reassessing Evaluation Practices in Visual Question Answering: A Case
Study on Out-of-Distribution Generalization [27.437077941786768]
Vision-and-language (V&L) models pretrained on large-scale multimodal data have demonstrated strong performance on various tasks.
We evaluate two pretrained V&L models under different settings by conducting cross-dataset evaluations.
We find that these models tend to learn to solve the benchmark, rather than learning the high-level skills required by the VQA task.
arXiv Detail & Related papers (2022-05-24T16:44:45Z) - Introspective Distillation for Robust Question Answering [70.18644911309468]
Question answering (QA) models are well-known to exploit data bias, e.g., the language prior in visual QA and the position bias in reading comprehension.
Recent debiasing methods achieve good out-of-distribution (OOD) generalizability with a considerable sacrifice of the in-distribution (ID) performance.
We present a novel debiasing method called Introspective Distillation (IntroD) to make the best of both worlds for QA.
arXiv Detail & Related papers (2021-11-01T15:30:15Z) - X-GGM: Graph Generative Modeling for Out-of-Distribution Generalization
in Visual Question Answering [49.36818290978525]
Recompositions of existing visual concepts can generate unseen compositions in the training set.
We propose a graph generative modeling-based training scheme (X-GGM) to handle the problem implicitly.
The baseline VQA model trained with the X-GGM scheme achieves state-of-the-art OOD performance on two standard VQA OOD benchmarks.
arXiv Detail & Related papers (2021-07-24T10:17:48Z) - Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
Class-imbalance View [129.392671317356]
We propose to interpret the language prior problem in VQA from a class-imbalance view.
It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer.
We also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.
arXiv Detail & Related papers (2020-10-30T00:57:17Z) - On the Value of Out-of-Distribution Testing: An Example of Goodhart's
Law [78.10523907729642]
VQA-CP has become the standard OOD benchmark for visual question answering.
Most published methods rely on explicit knowledge of the construction of the OOD splits.
We show that embarrassingly-simple methods, including one that generates answers at random, surpass the state of the art on some question types.
arXiv Detail & Related papers (2020-05-19T06:45:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.