Overcoming Language Bias in Remote Sensing Visual Question Answering via
Adversarial Training
- URL: http://arxiv.org/abs/2306.00483v1
- Date: Thu, 1 Jun 2023 09:32:45 GMT
- Title: Overcoming Language Bias in Remote Sensing Visual Question Answering via
Adversarial Training
- Authors: Zhenghang Yuan, Lichao Mou, Xiao Xiang Zhu
- Abstract summary: Visual Question Answering (VQA) models commonly face the challenge of language bias.
We present a novel framework to reduce the language bias of the VQA for remote sensing data.
- Score: 22.473676537463607
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The Visual Question Answering (VQA) system offers a user-friendly interface
and enables human-computer interaction. However, VQA models commonly face the
challenge of language bias, resulting from the learned superficial correlation
between questions and answers. To address this issue, in this study, we present
a novel framework to reduce the language bias of the VQA for remote sensing
data (RSVQA). Specifically, we add an adversarial branch to the original VQA
framework. Based on the adversarial branch, we introduce two regularizers to
constrain the training process against language bias. Furthermore, to evaluate
the performance in terms of language bias, we propose a new metric that
combines standard accuracy with the performance drop when incorporating
question and random image information. Experimental results demonstrate the
effectiveness of our method. We believe that our method can shed light on
future work for reducing language bias on the RSVQA task.
Related papers
- The curse of language biases in remote sensing VQA: the role of spatial
attributes, language diversity, and the need for clear evaluation [32.7348470366509]
The goal of RSVQA is to answer a question formulated in natural language about a remote sensing image.
The problem of language biases is often overlooked in the remote sensing community.
The present work aims at highlighting the problem of language biases in RSVQA with a threefold analysis strategy.
arXiv Detail & Related papers (2023-11-28T13:45:15Z) - Unveiling Cross Modality Bias in Visual Question Answering: A Causal
View with Possible Worlds VQA [111.41719652451701]
We first model a confounding effect that causes language and vision bias simultaneously.
We then propose a counterfactual inference to remove the influence of this effect.
The proposed method outperforms the state-of-the-art methods in VQA-CP v2 datasets.
arXiv Detail & Related papers (2023-05-31T09:02:58Z) - SC-ML: Self-supervised Counterfactual Metric Learning for Debiased
Visual Question Answering [10.749155815447127]
We propose a self-supervised counterfactual metric learning (SC-ML) method to focus the image features better.
SC-ML can adaptively select the question-relevant visual features to answer the question, reducing the negative influence of question-irrelevant visual features on inferring answers.
arXiv Detail & Related papers (2023-04-04T09:05:11Z) - Overcoming Language Priors in Visual Question Answering via
Distinguishing Superficially Similar Instances [17.637150597493463]
We propose a novel training framework that explicitly encourages the VQA model to distinguish between the superficially similar instances.
We exploit the proposed distinguishing module to increase the distance between the instance and its counterparts in the answer space.
Experimental results show that our method achieves the state-of-the-art performance on VQA-CP v2.
arXiv Detail & Related papers (2022-09-18T10:30:44Z) - Delving Deeper into Cross-lingual Visual Question Answering [115.16614806717341]
We show that simple modifications to the standard training setup can substantially reduce the transfer gap to monolingual English performance.
We analyze cross-lingual VQA across different question types of varying complexity for different multilingual multimodal Transformers.
arXiv Detail & Related papers (2022-02-15T18:22:18Z) - AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss [73.65872901950135]
This work attempts to tackle the language prior problem from the viewpoint of the feature space learning.
An adapted margin cosine loss is designed to discriminate the frequent and the sparse answer feature space.
Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models.
arXiv Detail & Related papers (2021-05-05T11:41:38Z) - Learning from Lexical Perturbations for Consistent Visual Question
Answering [78.21912474223926]
Existing Visual Question Answering (VQA) models are often fragile and sensitive to input variations.
We propose a novel approach to address this issue based on modular networks, which creates two questions related by linguistic perturbations.
We also present VQA Perturbed Pairings (VQA P2), a new, low-cost benchmark and augmentation pipeline to create controllable linguistic variations.
arXiv Detail & Related papers (2020-11-26T17:38:03Z) - Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a
Class-imbalance View [129.392671317356]
We propose to interpret the language prior problem in VQA from a class-imbalance view.
It explicitly reveals why the VQA model tends to produce a frequent yet obviously wrong answer.
We also justify the validity of the class imbalance interpretation scheme on other computer vision tasks, such as face recognition and image classification.
arXiv Detail & Related papers (2020-10-30T00:57:17Z) - Counterfactual VQA: A Cause-Effect Look at Language Bias [117.84189187160005]
VQA models tend to rely on language bias as a shortcut and fail to sufficiently learn the multi-modal knowledge from both vision and language.
We propose a novel counterfactual inference framework, which enables us to capture the language bias as the direct causal effect of questions on answers.
arXiv Detail & Related papers (2020-06-08T01:49:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.