Vision-Language Models Generate More Homogeneous Stories for Phenotypically Black Individuals
- URL: http://arxiv.org/abs/2412.09668v2
- Date: Thu, 20 Mar 2025 15:50:45 GMT
- Title: Vision-Language Models Generate More Homogeneous Stories for Phenotypically Black Individuals
- Authors: Messi H. J. Lee, Soyeon Jeon,
- Abstract summary: This study investigates how perceived racial phenotypicality influences Vision-Language Models' outputs.<n>Our findings reveal three key patterns: First, VLMs generate significantly more homogeneous stories about Black individuals with higher phenotypicality.<n>Second, stories about Black women consistently display greater homogeneity than those about Black men across all models tested.<n>Third, this homogeneity bias is primarily driven by a pronounced interaction where phenotypicality strongly influences content variation for Black women but has minimal impact for Black men.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Vision-Language Models (VLMs) extend Large Language Models' capabilities by integrating image processing, but concerns persist about their potential to reproduce and amplify human biases. While research has documented how these models perpetuate stereotypes across demographic groups, most work has focused on between-group biases rather than within-group differences. This study investigates homogeneity bias-the tendency to portray groups as more uniform than they are-within Black Americans, examining how perceived racial phenotypicality influences VLMs' outputs. Using computer-generated images that systematically vary in phenotypicality, we prompted VLMs to generate stories about these individuals and measured text similarity to assess content homogeneity. Our findings reveal three key patterns: First, VLMs generate significantly more homogeneous stories about Black individuals with higher phenotypicality compared to those with lower phenotypicality. Second, stories about Black women consistently display greater homogeneity than those about Black men across all models tested. Third, in two of three VLMs, this homogeneity bias is primarily driven by a pronounced interaction where phenotypicality strongly influences content variation for Black women but has minimal impact for Black men. These results demonstrate how intersectionality shapes AI-generated representations and highlight the persistence of stereotyping that mirror documented biases in human perception, where increased racial phenotypicality leads to greater stereotyping and less individualized representation.
Related papers
- Visual Cues of Gender and Race are Associated with Stereotyping in Vision-Language Models [0.2812395851874055]
Using standardized facial images that vary in prototypicality, we test four Vision Language Models for both trait associations and homogeneity bias in open-ended contexts.
We find that VLMs consistently generate more uniform stories for women compared to men, with people who are more gender prototypical in appearance being represented more uniformly.
In terms of trait associations, we find limited evidence of stereotyping-Black Americans were consistently linked with basketball across all models, while other racial associations (i.e., art, healthcare, appearance) varied by specific VLM.
arXiv Detail & Related papers (2025-03-07T02:25:16Z) - Fact-or-Fair: A Checklist for Behavioral Testing of AI Models on Fairness-Related Queries [85.909363478929]
In this study, we focus on 19 real-world statistics collected from authoritative sources.
We develop a checklist comprising objective and subjective queries to analyze behavior of large language models.
We propose metrics to assess factuality and fairness, and formally prove the inherent trade-off between these two aspects.
arXiv Detail & Related papers (2025-02-09T10:54:11Z) - Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs)
By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases.
The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z) - The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention [61.80236015147771]
We quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models.
Experiments on DoFaiR reveal that diversity-oriented instructions increase the number of different gender and racial groups.
We propose Fact-Augmented Intervention (FAI) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history.
arXiv Detail & Related papers (2024-06-29T09:09:42Z) - More Distinctively Black and Feminine Faces Lead to Increased Stereotyping in Vision-Language Models [0.30723404270319693]
This study explores how Vision Language Models (VLMs) perpetuate homogeneity bias and trait associations with regards to race and gender.
VLMs may associate subtle visual cues related to racial and gender groups with stereotypes in ways that could be challenging to mitigate.
arXiv Detail & Related papers (2024-05-22T00:45:29Z) - White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency.
We introduce the novel Language Agency Bias Evaluation benchmark.
We unveil language agency social biases in 3 recent Large Language Model (LLM)-generated content.
arXiv Detail & Related papers (2024-04-16T12:27:54Z) - AI-generated faces influence gender stereotypes and racial homogenization [1.6647208383676708]
We document significant biases in Stable Diffusion across six races, two genders, 32 professions, and eight attributes.
This analysis reveals significant racial homogenization depicting nearly all Middle Eastern men as bearded, brown-skinned, and wearing traditional attire.
We propose debiasing solutions that allow users to specify the desired distributions of race and gender when generating images.
arXiv Detail & Related papers (2024-02-01T20:32:14Z) - Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans [0.30723404270319693]
We investigate a new form of bias in large language models (LLMs)
We find that ChatGPT portrayed African, Asian, and Hispanic Americans as more homogeneous than White Americans.
We argue that the tendency to describe groups as less diverse risks perpetuating stereotypes and discriminatory behavior.
arXiv Detail & Related papers (2024-01-16T16:52:00Z) - Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs [67.51906565969227]
We study the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks.
Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse personas (e.g. an Asian person) spanning 5 socio-demographic groups.
arXiv Detail & Related papers (2023-11-08T18:52:17Z) - Mitigating stereotypical biases in text to image generative systems [10.068823600548157]
We do this by finetuning text-to-image models on synthetic data that varies in perceived skin tones and genders constructed from diverse text prompts.
Our diversity finetuned (DFT) model improves the group fairness metric by 150% for perceived skin tone and 97.7% for perceived gender.
arXiv Detail & Related papers (2023-10-10T18:01:52Z) - Fairness in AI Systems: Mitigating gender bias from language-vision
models [0.913755431537592]
We study the extent of the impact of gender bias in existing datasets.
We propose a methodology to mitigate its impact in caption based language vision models.
arXiv Detail & Related papers (2023-05-03T04:33:44Z) - Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems.
Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts.
We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z) - Easily Accessible Text-to-Image Generation Amplifies Demographic
Stereotypes at Large Scale [61.555788332182395]
We investigate the potential for machine learning models to amplify dangerous and complex stereotypes.
We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects.
arXiv Detail & Related papers (2022-11-07T18:31:07Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z) - CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked
Language Models [30.582132471411263]
We introduce the Crowd Stereotype Pairs benchmark (CrowS-Pairs)
CrowS-Pairs has 1508 examples that cover stereotypes dealing with nine types of bias, like race, religion, and age.
We find that all three of the widely-used sentences we evaluate substantially favor stereotypes in every category in CrowS-Pairs.
arXiv Detail & Related papers (2020-09-30T22:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.