Gender and Racial Bias in Visual Question Answering Datasets
- URL: http://arxiv.org/abs/2205.08148v3
- Date: Fri, 3 Jun 2022 06:36:16 GMT
- Title: Gender and Racial Bias in Visual Question Answering Datasets
- Authors: Yusuke Hirota, Yuta Nakashima, Noa Garcia
- Abstract summary: We investigate gender and racial bias in visual question answering (VQA) datasets.
We find that the distribution of answers is highly different between questions about women and men, as well as the existence of detrimental gender-stereotypical samples.
Our findings suggest that there are dangers associated to using VQA datasets without considering and dealing with the potentially harmful stereotypes.
- Score: 24.075869811508404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Vision-and-language tasks have increasingly drawn more attention as a means
to evaluate human-like reasoning in machine learning models. A popular task in
the field is visual question answering (VQA), which aims to answer questions
about images. However, VQA models have been shown to exploit language bias by
learning the statistical correlations between questions and answers without
looking into the image content: e.g., questions about the color of a banana are
answered with yellow, even if the banana in the image is green. If societal
bias (e.g., sexism, racism, ableism, etc.) is present in the training data,
this problem may be causing VQA models to learn harmful stereotypes. For this
reason, we investigate gender and racial bias in five VQA datasets. In our
analysis, we find that the distribution of answers is highly different between
questions about women and men, as well as the existence of detrimental
gender-stereotypical samples. Likewise, we identify that specific race-related
attributes are underrepresented, whereas potentially discriminatory samples
appear in the analyzed datasets. Our findings suggest that there are dangers
associated to using VQA datasets without considering and dealing with the
potentially harmful stereotypes. We conclude the paper by proposing solutions
to alleviate the problem before, during, and after the dataset collection
process.
Related papers
- UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models [55.22048505787125]
This paper contributes a comprehensive dataset, called UNK-VQA.
We first augment the existing data via deliberate perturbations on either the image or question.
We then extensively evaluate the zero- and few-shot performance of several emerging multi-modal large models.
arXiv Detail & Related papers (2023-10-17T02:38:09Z) - Gender Stereotyping Impact in Facial Expression Recognition [1.5340540198612824]
In recent years, machine learning-based models have become the most popular approach to Facial Expression Recognition (FER)
In publicly available FER datasets, apparent gender representation is usually mostly balanced, but their representation in the individual label is not.
We generate derivative datasets with different amounts of stereotypical bias by altering the gender proportions of certain labels.
We observe a discrepancy in the recognition of certain emotions between genders of up to $29 %$ under the worst bias conditions.
arXiv Detail & Related papers (2022-10-11T10:52:23Z) - BBQ: A Hand-Built Bias Benchmark for Question Answering [25.108222728383236]
It is well documented that NLP models learn social biases present in the world, but little work has been done to show how these biases manifest in actual model outputs for applied tasks like question answering (QA)
We introduce the Bias Benchmark for QA (BBQ), a dataset consisting of question-sets constructed by the authors that highlight textitattested social biases against people belonging to protected classes along nine different social dimensions relevant for U.S. English-speaking contexts.
We find that models strongly rely on stereotypes when the context is ambiguous, meaning that the model's outputs consistently reproduce harmful biases in this setting
arXiv Detail & Related papers (2021-10-15T16:43:46Z) - Human-Adversarial Visual Question Answering [62.30715496829321]
We benchmark state-of-the-art VQA models against human-adversarial examples.
We find that a wide range of state-of-the-art models perform poorly when evaluated on these examples.
arXiv Detail & Related papers (2021-06-04T06:25:32Z) - Overcoming Language Priors with Self-supervised Learning for Visual
Question Answering [62.88124382512111]
Most Visual Question Answering (VQA) models suffer from the language prior problem.
We introduce a self-supervised learning framework to solve this problem.
Our method can significantly outperform the state-of-the-art.
arXiv Detail & Related papers (2020-12-17T12:30:12Z) - Knowledge-Routed Visual Question Reasoning: Challenges for Deep
Representation Embedding [140.5911760063681]
We propose a novel dataset named Knowledge-Routed Visual Question Reasoning for VQA model evaluation.
We generate the question-answer pair based on both the Visual Genome scene graph and an external knowledge base with controlled programs.
arXiv Detail & Related papers (2020-12-14T00:33:44Z) - UnQovering Stereotyping Biases via Underspecified Questions [68.81749777034409]
We present UNQOVER, a framework to probe and quantify biases through underspecified questions.
We show that a naive use of model scores can lead to incorrect bias estimates due to two forms of reasoning errors.
We use this metric to analyze four important classes of stereotypes: gender, nationality, ethnicity, and religion.
arXiv Detail & Related papers (2020-10-06T01:49:52Z) - What Gives the Answer Away? Question Answering Bias Analysis on Video QA
Datasets [40.64071905569975]
Question answering biases in video QA datasets can mislead multimodal model to overfit to QA artifacts.
Our study shows biases can come from annotators and type of questions.
We also show empirically that using annotator-non-overlapping train-test splits can reduce QA biases for video QA datasets.
arXiv Detail & Related papers (2020-07-07T17:00:11Z) - SQuINTing at VQA Models: Introspecting VQA Models with Sub-Questions [66.86887670416193]
We show that state-of-the-art VQA models have comparable performance in answering perception and reasoning questions, but suffer from consistency problems.
To address this shortcoming, we propose an approach called Sub-Question-aware Network Tuning (SQuINT)
We show that SQuINT improves model consistency by 5%, also marginally improving performance on the Reasoning questions in VQA, while also displaying better attention maps.
arXiv Detail & Related papers (2020-01-20T01:02:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.