Related papers: GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models

GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models

URL: http://arxiv.org/abs/2407.21001v3
Date: Fri, 25 Oct 2024 11:30:26 GMT
Title: GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models
Authors: Ali Abdollahi, Mahdi Ghaznavi, Mohammad Reza Karimi Nejad, Arash Mari Oriyad, Reza Abbasi, Ali Salesi, Melika Behjati, Mohammad Hossein Rohban, Mahdieh Soleymani Baghshah,
Abstract summary: We show that vision-language models (VLMs) are biased towards identifying the individual with the expected gender as the performer of the activity. We refer to this bias in associating an activity with the gender of its actual performer in an image or text as the Gender-Activity Binding (GAB) bias. Our experiments indicate that VLMs experience an average performance decline of about 13.2% when confronted with gender-activity binding bias.
Score: 3.018378575149671
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Vision-language models (VLMs) are intensively used in many downstream tasks, including those requiring assessments of individuals appearing in the images. While VLMs perform well in simple single-person scenarios, in real-world applications, we often face complex situations in which there are persons of different genders doing different activities. We show that in such cases, VLMs are biased towards identifying the individual with the expected gender (according to ingrained gender stereotypes in the model or other forms of sample selection bias) as the performer of the activity. We refer to this bias in associating an activity with the gender of its actual performer in an image or text as the Gender-Activity Binding (GAB) bias and analyze how this bias is internalized in VLMs. To assess this bias, we have introduced the GAB dataset with approximately 5500 AI-generated images that represent a variety of activities, addressing the scarcity of real-world images for some scenarios. To have extensive quality control, the generated images are evaluated for their diversity, quality, and realism. We have tested 12 renowned pre-trained VLMs on this dataset in the context of text-to-image and image-to-text retrieval to measure the effect of this bias on their predictions. Additionally, we have carried out supplementary experiments to quantify the bias in VLMs' text encoders and to evaluate VLMs' capability to recognize activities. Our experiments indicate that VLMs experience an average performance decline of about 13.2% when confronted with gender-activity binding bias.

Related papers

Measuring Social Bias in Vision-Language Models with Face-Only Counterfactuals from Real Photos [79.03150233804458]
Real-world images entangle race and gender with correlated factors such as background and clothing, obscuring attribution.<n>We propose a textbfface-only counterfactual evaluation paradigm<n>We generate counterfactual variants by editing only facial attributes related to race and gender, keeping all other visual factors fixed.
arXiv Detail & Related papers (2026-01-11T14:35:06Z)
Bias in the Picture: Benchmarking VLMs with Social-Cue News Images and LLM-as-Judge Assessment [8.451522319478512]
We introduce a news-image benchmark consisting of 1,343 image-question pairs drawn from diverse outlets.<n>We evaluate a range of state-of-the-art VLMs and employ a large language model (LLM) as judge, with human verification.<n>Our findings show that: (i) visual context systematically shifts model outputs in open-ended settings; (ii) bias prevalence varies across attributes and models, with particularly high risk for gender and occupation; and (iii) higher faithfulness does not necessarily correspond to lower bias.
arXiv Detail & Related papers (2025-09-24T00:33:58Z)
From Individuals to Interactions: Benchmarking Gender Bias in Multimodal Large Language Models from the Lens of Social Relationship [13.416624729344477]
We introduce Genres, a novel benchmark designed to evaluate gender bias in MLLMs through the lens of social in relationships generated narratives.<n>Our findings underscore the importance of relationship-aware benchmarks for diagnosing subtle, interaction-driven gender bias in MLLMs.
arXiv Detail & Related papers (2025-06-29T06:03:21Z)
VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models [23.329280888159744]
This work introduces VIGNETTE, a large-scale VQA benchmark with 30M+ images for evaluating bias in vision-language models (VLMs)<n>We assess how VLMs interpret identities in contextualized settings, revealing how models make trait and capability assumptions and exhibit patterns of discrimination.<n>Our findings uncover subtle, multifaceted, and surprising stereotypical patterns, offering insights into how VLMs construct social meaning from inputs.
arXiv Detail & Related papers (2025-05-28T22:00:30Z)
How far can bias go? -- Tracing bias from pretraining data to alignment [54.51310112013655]
This study examines the correlation between gender-occupation bias in pre-training data and their manifestation in LLMs. Our findings reveal that biases present in pre-training data are amplified in model outputs.
arXiv Detail & Related papers (2024-11-28T16:20:25Z)
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) [82.57490175399693]
We study gender bias in 22 popular image-to-text vision-language assistants (VLAs) Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances. To eliminate the gender bias in these models, we find that finetuning-based debiasing methods achieve the best tradeoff between debiasing and retaining performance on downstream tasks.
arXiv Detail & Related papers (2024-10-25T05:59:44Z)
FaceSaliencyAug: Mitigating Geographic, Gender and Stereotypical Biases via Saliency-Based Data Augmentation [46.74201905814679]
We present an approach named FaceSaliencyAug aimed at addressing the gender bias in computer vision models. We quantify dataset diversity using Image Similarity Score (ISS) across five datasets, including Flickr Faces HQ (FFHQ), WIKI, IMDB, Labelled Faces in the Wild (LFW), UTK Faces, and Diverse dataset. Our experiments reveal a reduction in gender bias for both CNNs and ViTs, indicating the efficacy of our method in promoting fairness and inclusivity in computer vision models.
arXiv Detail & Related papers (2024-10-17T22:36:52Z)
GenderCARE: A Comprehensive Framework for Assessing and Reducing Gender Bias in Large Language Models [73.23743278545321]
Large language models (LLMs) have exhibited remarkable capabilities in natural language generation, but have also been observed to magnify societal biases. GenderCARE is a comprehensive framework that encompasses innovative Criteria, bias Assessment, Reduction techniques, and Evaluation metrics.
arXiv Detail & Related papers (2024-08-22T15:35:46Z)
GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models. Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs. Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z)
A Unified Framework and Dataset for Assessing Societal Bias in Vision-Language Models [9.025958469582363]
We propose a unified framework for evaluating gender, race, and age biases in vision-language models (VLMs) We generate high-quality synthetic datasets that intentionally conceal gender, race, and age information across different professional domains. The dataset includes action-based descriptions of each profession and serves as a benchmark for evaluating societal biases in vision-language models (VLMs)
arXiv Detail & Related papers (2024-02-21T09:17:51Z)
Probing Intersectional Biases in Vision-Language Models with Counterfactual Examples [5.870913541790421]
We employ text-to-image diffusion models to produce counterfactual examples for probing intserctional social biases at scale. Our approach utilizes Stable Diffusion with cross attention control to produce sets of counterfactual image-text pairs. We conduct extensive experiments using our generated dataset which reveal the intersectional social biases present in state-of-the-art VLMs.
arXiv Detail & Related papers (2023-10-04T17:25:10Z)
VisoGender: A dataset for benchmarking gender bias in image-text pronoun resolution [80.57383975987676]
VisoGender is a novel dataset for benchmarking gender bias in vision-language models. We focus on occupation-related biases within a hegemonic system of binary gender, inspired by Winograd and Winogender schemas. We benchmark several state-of-the-art vision-language models and find that they demonstrate bias in resolving binary gender in complex scenes.
arXiv Detail & Related papers (2023-06-21T17:59:51Z)
DeAR: Debiasing Vision-Language Models with Additive Residuals [5.672132510411465]
Large pre-trained vision-language models (VLMs) provide rich, adaptable image and text representations. These models suffer from societal biases owing to the skewed distribution of various identity groups in the training data. We present DeAR, a novel debiasing method that learns additive residual image representations to offset the original representations.
arXiv Detail & Related papers (2023-03-18T14:57:43Z)
Auditing Gender Presentation Differences in Text-to-Image Models [54.16959473093973]
We study how gender is presented differently in text-to-image models. By probing gender indicators in the input text, we quantify the frequency differences of presentation-centric attributes. We propose an automatic method to estimate such differences.
arXiv Detail & Related papers (2023-02-07T18:52:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.