Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
Class-imbalance View
- URL: http://arxiv.org/abs/2010.16010v4
- Date: Tue, 14 Dec 2021 16:18:33 GMT
- Title: Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
Class-imbalance View
- Authors: Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Qi Tian, Min Zhang
- Abstract summary: We propose to interpret the language prior problem in VQA from a class-imbalance view.
It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer.
We also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.
- Score: 129.392671317356
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent studies have pointed out that many well-developed Visual Question
Answering (VQA) models are heavily affected by the language prior problem,
which refers to making predictions based on the co-occurrence pattern between
textual questions and answers instead of reasoning visual contents. To tackle
it, most existing methods focus on enhancing visual feature learning to reduce
this superficial textual shortcut influence on VQA model decisions. However,
limited effort has been devoted to providing an explicit interpretation for its
inherent cause. It thus lacks a good guidance for the research community to
move forward in a purposeful way, resulting in model construction perplexity in
overcoming this non-trivial problem. In this paper, we propose to interpret the
language prior problem in VQA from a class-imbalance view. Concretely, we
design a novel interpretation scheme whereby the loss of mis-predicted frequent
and sparse answers of the same question type is distinctly exhibited during the
late training phase. It explicitly reveals why the VQA model tends to produce a
frequent yet obviously wrong answer, to a given question whose right answer is
sparse in the training set. Based upon this observation, we further develop a
novel loss re-scaling approach to assign different weights to each answer based
on the training data statistics for computing the final loss. We apply our
approach into three baselines and the experimental results on two VQA-CP
benchmark datasets evidently demonstrate its effectiveness. In addition, we
also justify the validity of the class imbalance interpretation scheme on other
computer vision tasks, such as face recognition and image classification.
Related papers
- Unveiling Cross Modality Bias in Visual Question Answering: A Causal
View with Possible Worlds VQA [111.41719652451701]
We first model a confounding effect that causes language and vision bias simultaneously.
We then propose a counterfactual inference to remove the influence of this effect.
The proposed method outperforms the state-of-the-art methods in VQA-CP v2 datasets.
arXiv Detail & Related papers (2023-05-31T09:02:58Z) - Knowledge-Based Counterfactual Queries for Visual Question Answering [0.0]
We propose a systematic method for explaining the behavior and investigating the robustness of VQA models through counterfactual perturbations.
For this reason, we exploit structured knowledge bases to perform deterministic, optimal and controllable word-level replacements targeting the linguistic modality.
We then evaluate the model's response against such counterfactual inputs.
arXiv Detail & Related papers (2023-03-05T08:00:30Z) - Continual VQA for Disaster Response Systems [0.0]
Visual Question Answering (VQA) is a multi-modal task that involves answering questions from an input image.
Main challenge is the delay caused by the generation of labels in the assessment of the affected areas.
We deploy pre-trained CLIP model, which is trained on visual-image pairs.
We surpass previous state-of-the-art results on the FloodNet dataset.
arXiv Detail & Related papers (2022-09-21T12:45:51Z) - Visual Perturbation-aware Collaborative Learning for Overcoming the
Language Prior Problem [60.0878532426877]
We propose a novel collaborative learning scheme from the viewpoint of visual perturbation calibration.
Specifically, we devise a visual controller to construct two sorts of curated images with different perturbation extents.
The experimental results on two diagnostic VQA-CP benchmark datasets evidently demonstrate its effectiveness.
arXiv Detail & Related papers (2022-07-24T23:50:52Z) - COIN: Counterfactual Image Generation for VQA Interpretation [5.994412766684842]
We introduce an interpretability approach for VQA models by generating counterfactual images.
In addition to interpreting the result of VQA models on single images, the obtained results and the discussion provides an extensive explanation of VQA models' behaviour.
arXiv Detail & Related papers (2022-01-10T13:51:35Z) - AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss [73.65872901950135]
This work attempts to tackle the language prior problem from the viewpoint of the feature space learning.
An adapted margin cosine loss is designed to discriminate the frequent and the sparse answer feature space.
Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models.
arXiv Detail & Related papers (2021-05-05T11:41:38Z) - Learning from Lexical Perturbations for Consistent Visual Question
Answering [78.21912474223926]
Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations.
We propose a novel approach to address this issue based on modular networks, which creates two questions related by linguistic perturbations.
We also present VQA Perturbed Pairings (VQA P2), a new, low-cost benchmark and augmentation pipeline to create controllable linguistic variations.
arXiv Detail & Related papers (2020-11-26T17:38:03Z) - Counterfactual Samples Synthesizing for Robust Visual Question Answering [104.72828511083519]
We propose a model-agnostic Counterfactual Samples Synthesizing (CSS) training scheme.
CSS generates numerous counterfactual training samples by masking critical objects in images or words in questions.
We achieve a record-breaking performance of 58.95% on VQA-CP v2, with 6.5% gains.
arXiv Detail & Related papers (2020-03-14T08:34:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.