Related papers: Vision-Language Models Represent Darker-Skinned Black Individuals as More Homogeneous than Lighter-Skinned Black Individuals

Vision-Language Models Represent Darker-Skinned Black Individuals as More Homogeneous than Lighter-Skinned Black Individuals

URL: http://arxiv.org/abs/2412.09668v1
Date: Thu, 12 Dec 2024 18:53:49 GMT
Title: Vision-Language Models Represent Darker-Skinned Black Individuals as More Homogeneous than Lighter-Skinned Black Individuals
Authors: Messi H. J. Lee, Soyeon Jeon,
Abstract summary: Vision-Language Models (VLMs) combine Large Language Model (LLM) capabilities with image processing, enabling tasks like image captioning and text-to-image generation.<n>Skin tone bias, where darker-skinned individuals face more negative stereotyping than lighter-skinned individuals, is well-documented in the social sciences.<n>We sampled computer-generated images of Black American men and women, controlling for skin tone variations while keeping other features constant.
Score: 0.0
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Vision-Language Models (VLMs) combine Large Language Model (LLM) capabilities with image processing, enabling tasks like image captioning and text-to-image generation. Yet concerns persist about their potential to amplify human-like biases, including skin tone bias. Skin tone bias, where darker-skinned individuals face more negative stereotyping than lighter-skinned individuals, is well-documented in the social sciences but remains under-explored in Artificial Intelligence (AI), particularly in VLMs. While well-documented in the social sciences, this bias remains under-explored in AI, particularly in VLMs. Using the GAN Face Database, we sampled computer-generated images of Black American men and women, controlling for skin tone variations while keeping other features constant. We then asked VLMs to write stories about these faces and compared the homogeneity of the generated stories. Stories generated by VLMs about darker-skinned Black individuals were more homogeneous than those about lighter-skinned individuals in three of four models, and Black women were consistently represented more homogeneously than Black men across all models. Interaction effects revealed a greater impact of skin tone on women in two VLMs, while the other two showed nonsignificant results, reflecting known stereotyping patterns. These findings underscore the propagation of biases from single-modality AI systems to multimodal models and highlight the need for further research to address intersectional biases in AI.

Related papers

Surface Fairness, Deep Bias: A Comparative Study of Bias in Language Models [49.41113560646115]
We investigate various proxy measures of bias in large language models (LLMs)<n>We find that evaluating models with pre-prompted personae on a multi-subject benchmark (MMLU) leads to negligible and mostly random differences in scores.<n>With the recent trend for LLM assistant memory and personalization, these problems open up from a different angle.
arXiv Detail & Related papers (2025-06-12T08:47:40Z)
Adultification Bias in LLMs and Text-to-Image Models [55.02903075972816]
We study bias along axes of race and gender in young girls.<n>We focus on "adultification bias," a phenomenon in which Black girls are presumed to be more defiant, sexually intimate, and culpable than their White peers.
arXiv Detail & Related papers (2025-06-08T21:02:33Z)
Visual Cues of Gender and Race are Associated with Stereotyping in Vision-Language Models [0.2812395851874055]
Using standardized facial images that vary in prototypicality, we test four Vision Language Models for both trait associations and homogeneity bias in open-ended contexts. We find that VLMs consistently generate more uniform stories for women compared to men, with people who are more gender prototypical in appearance being represented more uniformly. In terms of trait associations, we find limited evidence of stereotyping-Black Americans were consistently linked with basketball across all models, while other racial associations (i.e., art, healthcare, appearance) varied by specific VLM.
arXiv Detail & Related papers (2025-03-07T02:25:16Z)
Fact-or-Fair: A Checklist for Behavioral Testing of AI Models on Fairness-Related Queries [85.909363478929]
In this study, we focus on 19 real-world statistics collected from authoritative sources. We develop a checklist comprising objective and subjective queries to analyze behavior of large language models. We propose metrics to assess factuality and fairness, and formally prove the inherent trade-off between these two aspects.
arXiv Detail & Related papers (2025-02-09T10:54:11Z)
Spoken Stereoset: On Evaluating Social Bias Toward Speaker in Speech Large Language Models [50.40276881893513]
This study introduces Spoken Stereoset, a dataset specifically designed to evaluate social biases in Speech Large Language Models (SLLMs) By examining how different models respond to speech from diverse demographic groups, we aim to identify these biases. The findings indicate that while most models show minimal bias, some still exhibit slightly stereotypical or anti-stereotypical tendencies.
arXiv Detail & Related papers (2024-08-14T16:55:06Z)
The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention [61.80236015147771]
We quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models. Experiments on DoFaiR reveal that diversity-oriented instructions increase the number of different gender and racial groups. We propose Fact-Augmented Intervention (FAI) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history.
arXiv Detail & Related papers (2024-06-29T09:09:42Z)
More Distinctively Black and Feminine Faces Lead to Increased Stereotyping in Vision-Language Models [0.30723404270319693]
This study explores how Vision Language Models (VLMs) perpetuate homogeneity bias and trait associations with regards to race and gender. VLMs may associate subtle visual cues related to racial and gender groups with stereotypes in ways that could be challenging to mitigate.
arXiv Detail & Related papers (2024-05-22T00:45:29Z)
White Men Lead, Black Women Help? Benchmarking Language Agency Social Biases in LLMs [58.27353205269664]
Social biases can manifest in language agency. We introduce the novel Language Agency Bias Evaluation benchmark. We unveil language agency social biases in 3 recent Large Language Model (LLM)-generated content.
arXiv Detail & Related papers (2024-04-16T12:27:54Z)
AI-generated faces influence gender stereotypes and racial homogenization [1.6647208383676708]
We document significant biases in Stable Diffusion across six races, two genders, 32 professions, and eight attributes. This analysis reveals significant racial homogenization depicting nearly all Middle Eastern men as bearded, brown-skinned, and wearing traditional attire. We propose debiasing solutions that allow users to specify the desired distributions of race and gender when generating images.
arXiv Detail & Related papers (2024-02-01T20:32:14Z)
Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans [0.30723404270319693]
We investigate a new form of bias in large language models (LLMs) We find that ChatGPT portrayed African, Asian, and Hispanic Americans as more homogeneous than White Americans. We argue that the tendency to describe groups as less diverse risks perpetuating stereotypes and discriminatory behavior.
arXiv Detail & Related papers (2024-01-16T16:52:00Z)
Bias Runs Deep: Implicit Reasoning Biases in Persona-Assigned LLMs [67.51906565969227]
We study the unintended side-effects of persona assignment on the ability of LLMs to perform basic reasoning tasks. Our study covers 24 reasoning datasets, 4 LLMs, and 19 diverse personas (e.g. an Asian person) spanning 5 socio-demographic groups.
arXiv Detail & Related papers (2023-11-08T18:52:17Z)
Mitigating stereotypical biases in text to image generative systems [10.068823600548157]
We do this by finetuning text-to-image models on synthetic data that varies in perceived skin tones and genders constructed from diverse text prompts. Our diversity finetuned (DFT) model improves the group fairness metric by 150% for perceived skin tone and 97.7% for perceived gender.
arXiv Detail & Related papers (2023-10-10T18:01:52Z)
Fairness in AI Systems: Mitigating gender bias from language-vision models [0.913755431537592]
We study the extent of the impact of gender bias in existing datasets. We propose a methodology to mitigate its impact in caption based language vision models.
arXiv Detail & Related papers (2023-05-03T04:33:44Z)
Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems. Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts. We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z)
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale [61.555788332182395]
We investigate the potential for machine learning models to amplify dangerous and complex stereotypes. We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects.
arXiv Detail & Related papers (2022-11-07T18:31:07Z)
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models. First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding. Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z)
CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Language Models [30.582132471411263]
We introduce the Crowd Stereotype Pairs benchmark (CrowS-Pairs) CrowS-Pairs has 1508 examples that cover stereotypes dealing with nine types of bias, like race, religion, and age. We find that all three of the widely-used sentences we evaluate substantially favor stereotypes in every category in CrowS-Pairs.
arXiv Detail & Related papers (2020-09-30T22:38:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.