Counterfactual VQA: A Cause-Effect Look at Language Bias
- URL: http://arxiv.org/abs/2006.04315v4
- Date: Thu, 1 Apr 2021 16:15:36 GMT
- Title: Counterfactual VQA: A Cause-Effect Look at Language Bias
- Authors: Yulei Niu, Kaihua Tang, Hanwang Zhang, Zhiwu Lu, Xian-Sheng Hua,
Ji-Rong Wen
- Abstract summary: VQA models tend to rely on language bias as a shortcut and fail to sufficiently learn the multi-modal knowledge from both vision and language.
We propose a novel counterfactual inference framework, which enables us to capture the language bias as the direct causal effect of questions on answers.
- Score: 117.84189187160005
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: VQA models may tend to rely on language bias as a shortcut and thus fail to
sufficiently learn the multi-modal knowledge from both vision and language.
Recent debiasing methods proposed to exclude the language prior during
inference. However, they fail to disentangle the "good" language context and
"bad" language bias from the whole. In this paper, we investigate how to
mitigate language bias in VQA. Motivated by causal effects, we proposed a novel
counterfactual inference framework, which enables us to capture the language
bias as the direct causal effect of questions on answers and reduce the
language bias by subtracting the direct language effect from the total causal
effect. Experiments demonstrate that our proposed counterfactual inference
framework 1) is general to various VQA backbones and fusion strategies, 2)
achieves competitive performance on the language-bias sensitive VQA-CP dataset
while performs robustly on the balanced VQA v2 dataset without any augmented
data. The code is available at https://github.com/yuleiniu/cfvqa.
Related papers
- Eliminating the Language Bias for Visual Question Answering with fine-grained Causal Intervention [9.859335795616028]
We propose a novel causal intervention training scheme named CIBi to eliminate language bias from a finer-grained perspective.
We employ causal intervention and contrastive learning to eliminate context bias and improve the multi-modal representation.
We design a new question-only branch based on counterfactual generation to distill and eliminate keyword bias.
arXiv Detail & Related papers (2024-10-14T06:09:16Z) - Overcoming Language Bias in Remote Sensing Visual Question Answering via
Adversarial Training [22.473676537463607]
Visual Question Answering (VQA) models commonly face the challenge of language bias.
We present a novel framework to reduce the language bias of the VQA for remote sensing data.
arXiv Detail & Related papers (2023-06-01T09:32:45Z) - Unveiling Cross Modality Bias in Visual Question Answering: A Causal
View with Possible Worlds VQA [111.41719652451701]
We first model a confounding effect that causes language and vision bias simultaneously.
We then propose a counterfactual inference to remove the influence of this effect.
The proposed method outperforms the state-of-the-art methods in VQA-CP v2 datasets.
arXiv Detail & Related papers (2023-05-31T09:02:58Z) - Delving Deeper into Cross-lingual Visual Question Answering [115.16614806717341]
We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance.
We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers.
arXiv Detail & Related papers (2022-02-15T18:22:18Z) - On the Language Coverage Bias for Neural Machine Translation [81.81456880770762]
Language coverage bias is important for neural machine translation (NMT) because the target-original training data is not well exploited in current practice.
By carefully designing experiments, we provide comprehensive analyses of the language coverage bias in the training data.
We propose two simple and effective approaches to alleviate the language coverage bias problem.
arXiv Detail & Related papers (2021-06-07T01:55:34Z) - AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss [73.65872901950135]
This work attempts to tackle the language prior problem from the viewpoint of the feature space learning.
An adapted margin cosine loss is designed to discriminate the frequent and the sparse answer feature space.
Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models.
arXiv Detail & Related papers (2021-05-05T11:41:38Z) - Learning content and context with language bias for Visual Question
Answering [31.39505099600821]
We propose a novel learning strategy named CCB, which forces VQA models to answer questions relying on Content and Context with language bias.
CCB outperforms the state-of-the-art methods in terms of accuracy on VQA-CP v2.
arXiv Detail & Related papers (2020-12-21T06:22:50Z) - On the General Value of Evidence, and Bilingual Scene-Text Visual
Question Answering [120.64104995052189]
We present a dataset that takes a step towards addressing this problem in that it contains questions expressed in two languages.
Measuring reasoning directly encourages generalization by penalizing answers that are coincidentally correct.
The dataset reflects the scene-text version of the VQA problem, and the reasoning evaluation can be seen as a text-based version of a referring expression challenge.
arXiv Detail & Related papers (2020-02-24T13:02:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.