Examining Gender and Racial Bias in Large Vision-Language Models Using a
Novel Dataset of Parallel Images
- URL: http://arxiv.org/abs/2402.05779v1
- Date: Thu, 8 Feb 2024 16:11:23 GMT
- Title: Examining Gender and Racial Bias in Large Vision-Language Models Using a
Novel Dataset of Parallel Images
- Authors: Kathleen C. Fraser and Svetlana Kiritchenko
- Abstract summary: We present a new dataset PAIRS (PArallel Images for eveRyday Scenarios)
The PAIRS dataset contains sets of AI-generated images of people, such that the images are highly similar in terms of background and visual content, but differ along the dimensions of gender and race.
By querying the LVLMs with such images, we observe significant differences in the responses according to the perceived gender or race of the person depicted.
- Score: 10.385717398477414
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Following on recent advances in large language models (LLMs) and subsequent
chat models, a new wave of large vision-language models (LVLMs) has emerged.
Such models can incorporate images as input in addition to text, and perform
tasks such as visual question answering, image captioning, story generation,
etc. Here, we examine potential gender and racial biases in such systems, based
on the perceived characteristics of the people in the input images. To
accomplish this, we present a new dataset PAIRS (PArallel Images for eveRyday
Scenarios). The PAIRS dataset contains sets of AI-generated images of people,
such that the images are highly similar in terms of background and visual
content, but differ along the dimensions of gender (man, woman) and race
(Black, white). By querying the LVLMs with such images, we observe significant
differences in the responses according to the perceived gender or race of the
person depicted.
Related papers
- Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You [64.74707085021858]
We show that multilingual models suffer from significant gender biases just as monolingual models do.
We propose a novel benchmark, MAGBIG, intended to foster research on gender bias in multilingual models.
Our results show that not only do models exhibit strong gender biases but they also behave differently across languages.
arXiv Detail & Related papers (2024-01-29T12:02:28Z) - Targeted Image Data Augmentation Increases Basic Skills Captioning
Robustness [0.932065750652415]
TIDA (Targeted Image-editing Data Augmentation) is a targeted data augmentation method focused on improving models' human-like abilities.
We show that a TIDA-enhanced dataset related to gender, color, and counting abilities induces better performance in several image captioning metrics.
arXiv Detail & Related papers (2023-09-27T20:12:41Z) - VisoGender: A dataset for benchmarking gender bias in image-text pronoun
resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models.
We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas.
We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z) - DeAR: Debiasing Vision-Language Models with Additive Residuals [5.672132510411465]
Large pre-trained vision-language models (VLMs) provide rich, adaptable image and text representations.
These models suffer from societal biases owing to the skewed distribution of various identity groups in the training data.
We present DeAR, a novel debiasing method that learns additive residual image representations to offset the original representations.
arXiv Detail & Related papers (2023-03-18T14:57:43Z) - Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models.
By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes.
We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z) - Vision-Language Models Performing Zero-Shot Tasks Exhibit Gender-based
Disparities [19.03751960721954]
We explore the extent to which zero-shot vision-language models exhibit gender bias for different vision tasks.
We evaluate different vision-language models with multiple datasets across a set of concepts.
arXiv Detail & Related papers (2023-01-26T13:44:31Z) - How well can Text-to-Image Generative Models understand Ethical Natural
Language Interventions? [67.97752431429865]
We study the effect on the diversity of the generated images when adding ethical intervention.
Preliminary studies indicate that a large change in the model predictions is triggered by certain phrases such as 'irrespective of gender'
arXiv Detail & Related papers (2022-10-27T07:32:39Z) - On Advances in Text Generation from Images Beyond Captioning: A Case
Study in Self-Rationalization [89.94078728495423]
We show that recent advances in each modality, CLIP image representations and scaling of language models, do not consistently improve multimodal self-rationalization of tasks with multimodal inputs.
Our findings call for a backbone modelling approach that can be built on to advance text generation from images and text beyond image captioning.
arXiv Detail & Related papers (2022-05-24T00:52:40Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.