Related papers: More Distinctively Black and Feminine Faces Lead to Increased Stereotyping in Vision-Language Models

More Distinctively Black and Feminine Faces Lead to Increased Stereotyping in Vision-Language Models

URL: http://arxiv.org/abs/2407.06194v1
Date: Wed, 22 May 2024 00:45:29 GMT
Title: More Distinctively Black and Feminine Faces Lead to Increased Stereotyping in Vision-Language Models
Authors: Messi H. J. Lee, Jacob M. Montgomery, Calvin K. Lai,
Abstract summary: This study explores how Vision Language Models (VLMs) perpetuate homogeneity bias and trait associations with regards to race and gender. VLMs may associate subtle visual cues related to racial and gender groups with stereotypes in ways that could be challenging to mitigate.
Score: 0.30723404270319693
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Vision Language Models (VLMs), exemplified by GPT-4V, adeptly integrate text and vision modalities. This integration enhances Large Language Models' ability to mimic human perception, allowing them to process image inputs. Despite VLMs' advanced capabilities, however, there is a concern that VLMs inherit biases of both modalities in ways that make biases more pervasive and difficult to mitigate. Our study explores how VLMs perpetuate homogeneity bias and trait associations with regards to race and gender. When prompted to write stories based on images of human faces, GPT-4V describes subordinate racial and gender groups with greater homogeneity than dominant groups and relies on distinct, yet generally positive, stereotypes. Importantly, VLM stereotyping is driven by visual cues rather than group membership alone such that faces that are rated as more prototypically Black and feminine are subject to greater stereotyping. These findings suggest that VLMs may associate subtle visual cues related to racial and gender groups with stereotypes in ways that could be challenging to mitigate. We explore the underlying reasons behind this behavior and discuss its implications and emphasize the importance of addressing these biases as VLMs come to mirror human perception.

Related papers

VIGNETTE: Socially Grounded Bias Evaluation for Vision-Language Models [23.329280888159744]
This work introduces VIGNETTE, a large-scale VQA benchmark with 30M+ images for evaluating bias in vision-language models (VLMs)<n>We assess how VLMs interpret identities in contextualized settings, revealing how models make trait and capability assumptions and exhibit patterns of discrimination.<n>Our findings uncover subtle, multifaceted, and surprising stereotypical patterns, offering insights into how VLMs construct social meaning from inputs.
arXiv Detail & Related papers (2025-05-28T22:00:30Z)
Visual Cues of Gender and Race are Associated with Stereotyping in Vision-Language Models [0.2812395851874055]
Using standardized facial images that vary in prototypicality, we test four Vision Language Models for both trait associations and homogeneity bias in open-ended contexts. We find that VLMs consistently generate more uniform stories for women compared to men, with people who are more gender prototypical in appearance being represented more uniformly. In terms of trait associations, we find limited evidence of stereotyping-Black Americans were consistently linked with basketball across all models, while other racial associations (i.e., art, healthcare, appearance) varied by specific VLM.
arXiv Detail & Related papers (2025-03-07T02:25:16Z)
Vision-Language Models Generate More Homogeneous Stories for Phenotypically Black Individuals [0.0]
This study investigates how perceived racial phenotypicality influences Vision-Language Models' outputs. Our findings reveal three key patterns: First, VLMs generate significantly more homogeneous stories about Black individuals with higher phenotypicality. Second, stories about Black women consistently display greater homogeneity than those about Black men across all models tested. Third, this homogeneity bias is primarily driven by a pronounced interaction where phenotypicality strongly influences content variation for Black women but has minimal impact for Black men.
arXiv Detail & Related papers (2024-12-12T18:53:49Z)
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs) [82.57490175399693]
We study gender bias in 22 popular image-to-text vision-language assistants (VLAs) Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances. To eliminate the gender bias in these models, we find that finetuning-based debiasing methods achieve the best tradeoff between debiasing and retaining performance on downstream tasks.
arXiv Detail & Related papers (2024-10-25T05:59:44Z)
Are Large Language Models Ready for Travel Planning? [6.307444995285539]
Large language models (LLMs) show promise in hospitality and tourism, their ability to provide unbiased service across demographic groups remains unclear. This paper explores gender and ethnic biases when LLMs are utilized as travel planning assistants.
arXiv Detail & Related papers (2024-10-22T18:08:25Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
See It from My Perspective: Diagnosing the Western Cultural Bias of Large Vision-Language Models in Image Understanding [78.88461026069862]
Vision-language models (VLMs) can respond to queries about images in many languages. We present a novel investigation that demonstrates and localizes Western bias in image understanding.
arXiv Detail & Related papers (2024-06-17T15:49:51Z)
An Introduction to Vision-Language Modeling [128.6223984157515]
The vision-language model (VLM) applications will significantly impact our relationship with technology. We introduce what VLMs are, how they work, and how to train them. Although this work primarily focuses on mapping images to language, we also discuss extending VLMs to videos.
arXiv Detail & Related papers (2024-05-27T15:01:23Z)
Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans [0.30723404270319693]
We investigate a new form of bias in large language models (LLMs) We find that ChatGPT portrayed African, Asian, and Hispanic Americans as more homogeneous than White Americans. We argue that the tendency to describe groups as less diverse risks perpetuating stereotypes and discriminatory behavior.
arXiv Detail & Related papers (2024-01-16T16:52:00Z)
Aligning with Whom? Large Language Models Have Gender and Racial Biases in Subjective NLP Tasks [15.015148115215315]
We conduct experiments on four popular large language models (LLMs) to investigate their capability to understand group differences and potential biases in their predictions for politeness and offensiveness. We find that for both tasks, model predictions are closer to the labels from White and female participants. More specifically, when being prompted to respond from the perspective of "Black" and "Asian" individuals, models show lower performance in predicting both overall scores as well as the scores from corresponding groups.
arXiv Detail & Related papers (2023-11-16T10:02:24Z)
Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models [33.157279170602784]
We present Marked Personas, a prompt-based method to measure stereotypes in large language models (LLMs) We find that portrayals generated by GPT-3.5 and GPT-4 contain higher rates of racial stereotypes than human-written portrayals using the same prompts. An intersectional lens reveals tropes that dominate portrayals of marginalized groups, such as tropicalism and the hypersexualization of minoritized women.
arXiv Detail & Related papers (2023-05-29T16:29:22Z)
On Evaluating Adversarial Robustness of Large Vision-Language Models [64.66104342002882]
We evaluate the robustness of large vision-language models (VLMs) in the most realistic and high-risk setting. In particular, we first craft targeted adversarial examples against pretrained models such as CLIP and BLIP. Black-box queries on these VLMs can further improve the effectiveness of targeted evasion.
arXiv Detail & Related papers (2023-05-26T13:49:44Z)
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale [61.555788332182395]
We investigate the potential for machine learning models to amplify dangerous and complex stereotypes. We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects.
arXiv Detail & Related papers (2022-11-07T18:31:07Z)
Towards Understanding and Mitigating Social Biases in Language Models [107.82654101403264]
Large-scale pretrained language models (LMs) can be potentially dangerous in manifesting undesirable representational biases. We propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information.
arXiv Detail & Related papers (2021-06-24T17:52:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.