AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss
- URL: http://arxiv.org/abs/2105.01993v1
- Date: Wed, 5 May 2021 11:41:38 GMT
- Title: AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss
- Authors: Yangyang Guo and Liqiang Nie and Zhiyong Cheng and Feng Ji and Ji
Zhang and Alberto Del Bimbo
- Abstract summary: This work attempts to tackle the language prior problem from the viewpoint of the feature space learning.
An adapted margin cosine loss is designed to discriminate the frequent and the sparse answer feature space.
Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models.
- Score: 73.65872901950135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A number of studies point out that current Visual Question Answering (VQA)
models are severely affected by the language prior problem, which refers to
blindly making predictions based on the language shortcut. Some efforts have
been devoted to overcoming this issue with delicate models. However, there is
no research to address it from the angle of the answer feature space learning,
despite of the fact that existing VQA methods all cast VQA as a classification
task. Inspired by this, in this work, we attempt to tackle the language prior
problem from the viewpoint of the feature space learning. To this end, an
adapted margin cosine loss is designed to discriminate the frequent and the
sparse answer feature space under each question type properly. As a result, the
limited patterns within the language modality are largely reduced, thereby less
language priors would be introduced by our method. We apply this loss function
to several baseline models and evaluate its effectiveness on two VQA-CP
benchmarks. Experimental results demonstrate that our adapted margin cosine
loss can greatly enhance the baseline models with an absolute performance gain
of 15\% on average, strongly verifying the potential of tackling the language
prior problem in VQA from the angle of the answer feature space learning.
Related papers
- Overcoming Language Bias in Remote Sensing Visual Question Answering via
Adversarial Training [22.473676537463607]
Visual Question Answering (VQA) models commonly face the challenge of language bias.
We present a novel framework to reduce the language bias of the VQA for remote sensing data.
arXiv Detail & Related papers (2023-06-01T09:32:45Z) - Overcoming Language Priors in Visual Question Answering via
Distinguishing Superficially Similar Instances [17.637150597493463]
We propose a novel training framework that explicitly encourages the VQA model to distinguish between the superficially similar instances.
We exploit the proposed distinguishing module to increase the distance between the instance and its counterparts in the answer space.
Experimental results show that our method achieves the state-of-the-art performance on VQA-CP v2.
arXiv Detail & Related papers (2022-09-18T10:30:44Z) - Delving Deeper into Cross-lingual Visual Question Answering [115.16614806717341]
We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance.
We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers.
arXiv Detail & Related papers (2022-02-15T18:22:18Z) - Learning from Lexical Perturbations for Consistent Visual Question
Answering [78.21912474223926]
Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations.
We propose a novel approach to address this issue based on modular networks, which creates two questions related by linguistic perturbations.
We also present VQA Perturbed Pairings (VQA P2), a new, low-cost benchmark and augmentation pipeline to create controllable linguistic variations.
arXiv Detail & Related papers (2020-11-26T17:38:03Z) - Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
Class-imbalance View [129.392671317356]
We propose to interpret the language prior problem in VQA from a class-imbalance view.
It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer.
We also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.
arXiv Detail & Related papers (2020-10-30T00:57:17Z) - Contrast and Classify: Training Robust VQA Models [60.80627814762071]
We propose a novel training paradigm (ConClaT) that optimize both cross-entropy and contrastive losses.
We find that optimizing both losses -- either alternately or jointly -- is key to effective training.
arXiv Detail & Related papers (2020-10-13T00:23:59Z) - Estimating semantic structure for the VQA answer space [6.49970685896541]
We show that our approach is completely model-agnostic since it allows consistent improvements with three different VQA models.
We report SOTA-level performance on the challenging VQAv2-CP dataset.
arXiv Detail & Related papers (2020-06-10T08:32:56Z) - Counterfactual VQA: A Cause-Effect Look at Language Bias [117.84189187160005]
VQA models tend to rely on language bias as a shortcut and fail to sufficiently learn the multi-modal knowledge from both vision and language.
We propose a novel counterfactual inference framework, which enables us to capture the language bias as the direct causal effect of questions on answers.
arXiv Detail & Related papers (2020-06-08T01:49:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.