Towards Robust Visual Question Answering: Making the Most of Biased
Samples via Contrastive Learning
- URL: http://arxiv.org/abs/2210.04563v1
- Date: Mon, 10 Oct 2022 11:05:21 GMT
- Title: Towards Robust Visual Question Answering: Making the Most of Biased
Samples via Contrastive Learning
- Authors: Qingyi Si, Yuanxin Liu, Fandong Meng, Zheng Lin, Peng Fu, Yanan Cao,
Weiping Wang and Jie Zhou
- Abstract summary: We propose a novel contrastive learning approach, MMBS, for building robust VQA models by Making the Most of Biased Samples.
Specifically, we construct positive samples for contrastive learning by eliminating the information related to spurious correlation from the original training samples.
We validate our contributions by achieving competitive performance on the OOD dataset VQA-CP v2 while preserving robust performance on the ID dataset VQA v2.
- Score: 54.61762276179205
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Models for Visual Question Answering (VQA) often rely on the spurious
correlations, i.e., the language priors, that appear in the biased samples of
training set, which make them brittle against the out-of-distribution (OOD)
test data. Recent methods have achieved promising progress in overcoming this
problem by reducing the impact of biased samples on model training. However,
these models reveal a trade-off that the improvements on OOD data severely
sacrifice the performance on the in-distribution (ID) data (which is dominated
by the biased samples). Therefore, we propose a novel contrastive learning
approach, MMBS, for building robust VQA models by Making the Most of Biased
Samples. Specifically, we construct positive samples for contrastive learning
by eliminating the information related to spurious correlation from the
original training samples and explore several strategies to use the constructed
positive samples for training. Instead of undermining the importance of biased
samples in model training, our approach precisely exploits the biased samples
for unbiased information that contributes to reasoning. The proposed method is
compatible with various VQA backbones. We validate our contributions by
achieving competitive performance on the OOD dataset VQA-CP v2 while preserving
robust performance on the ID dataset VQA v2.
Related papers
- DOTA: Distributional Test-Time Adaptation of Vision-Language Models [52.98590762456236]
Training-free test-time dynamic adapter (TDA) is a promising approach to address this issue.
We propose a simple yet effective method for DistributiOnal Test-time Adaptation (Dota)
Dota continually estimates the distributions of test samples, allowing the model to continually adapt to the deployment environment.
arXiv Detail & Related papers (2024-09-28T15:03:28Z) - Think Twice: Measuring the Efficiency of Eliminating Prediction
Shortcuts of Question Answering Models [3.9052860539161918]
We propose a simple method for measuring a scale of models' reliance on any identified spurious feature.
We assess the robustness towards a large set of known and newly found prediction biases for various pre-trained models and debiasing methods in Question Answering (QA)
We find that while existing debiasing methods can mitigate reliance on a chosen spurious feature, the OOD performance gains of these methods can not be explained by mitigated reliance on biased features.
arXiv Detail & Related papers (2023-05-11T14:35:00Z) - Feature-Level Debiased Natural Language Understanding [86.8751772146264]
Existing natural language understanding (NLU) models often rely on dataset biases to achieve high performance on specific datasets.
We propose debiasing contrastive learning (DCT) to mitigate biased latent features and neglect the dynamic nature of bias.
DCT outperforms state-of-the-art baselines on out-of-distribution datasets while maintaining in-distribution performance.
arXiv Detail & Related papers (2022-12-11T06:16:14Z) - Reassessing Evaluation Practices in Visual Question Answering: A Case
Study on Out-of-Distribution Generalization [27.437077941786768]
Vision-and-language (V&L) models pretrained on large-scale multimodal data have demonstrated strong performance on various tasks.
We evaluate two pretrained V&L models under different settings by conducting cross-dataset evaluations.
We find that these models tend to learn to solve the benchmark, rather than learning the high-level skills required by the VQA task.
arXiv Detail & Related papers (2022-05-24T16:44:45Z) - General Greedy De-bias Learning [163.65789778416172]
We propose a General Greedy De-bias learning framework (GGD), which greedily trains the biased models and the base model like gradient descent in functional space.
GGD can learn a more robust base model under the settings of both task-specific biased models with prior knowledge and self-ensemble biased model without prior knowledge.
arXiv Detail & Related papers (2021-12-20T14:47:32Z) - Energy-bounded Learning for Robust Models of Code [16.592638312365164]
In programming, learning code representations has a variety of applications, including code classification, code search, comment generation, bug prediction, and so on.
We propose the use of an energy-bounded learning objective function to assign a higher score to in-distribution samples and a lower score to out-of-distribution samples in order to incorporate such out-of-distribution samples into the training process of source code models.
arXiv Detail & Related papers (2021-12-20T06:28:56Z) - Understanding Out-of-distribution: A Perspective of Data Dynamics [5.811774625668462]
This paper explores how data dynamics in training models can be used to understand the fundamental differences between OOD and in-distribution samples.
We found that syntactic characteristics of the data samples that the model consistently predicts incorrectly in both OOD and in-distribution cases directly contradict each.
arXiv Detail & Related papers (2021-11-29T17:34:38Z) - Understanding and Testing Generalization of Deep Networks on
Out-of-Distribution Data [30.471871571256198]
Deep network models perform excellently on In-Distribution data, but can significantly fail on Out-Of-Distribution data.
This study is devoted to analyzing the problem of experimental ID test and designing OOD test paradigm.
arXiv Detail & Related papers (2021-11-17T15:29:07Z) - Jo-SRC: A Contrastive Approach for Combating Noisy Labels [58.867237220886885]
We propose a noise-robust approach named Jo-SRC (Joint Sample Selection and Model Regularization based on Consistency)
Specifically, we train the network in a contrastive learning manner. Predictions from two different views of each sample are used to estimate its "likelihood" of being clean or out-of-distribution.
arXiv Detail & Related papers (2021-03-24T07:26:07Z) - One for More: Selecting Generalizable Samples for Generalizable ReID
Model [92.40951770273972]
This paper proposes a one-for-more training objective that takes the generalization ability of selected samples as a loss function.
Our proposed one-for-more based sampler can be seamlessly integrated into the ReID training framework.
arXiv Detail & Related papers (2020-12-10T06:37:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.