Visual Cues of Gender and Race are Associated with Stereotyping in Vision-Language Models
- URL: http://arxiv.org/abs/2503.05093v1
- Date: Fri, 07 Mar 2025 02:25:16 GMT
- Title: Visual Cues of Gender and Race are Associated with Stereotyping in Vision-Language Models
- Authors: Messi H. J. Lee, Soyeon Jeon, Jacob M. Montgomery, Calvin K. Lai,
- Abstract summary: Using standardized facial images that vary in prototypicality, we test four Vision Language Models for both trait associations and homogeneity bias in open-ended contexts.<n>We find that VLMs consistently generate more uniform stories for women compared to men, with people who are more gender prototypical in appearance being represented more uniformly.<n>In terms of trait associations, we find limited evidence of stereotyping-Black Americans were consistently linked with basketball across all models, while other racial associations (i.e., art, healthcare, appearance) varied by specific VLM.
- Score: 0.2812395851874055
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Current research on bias in Vision Language Models (VLMs) has important limitations: it is focused exclusively on trait associations while ignoring other forms of stereotyping, it examines specific contexts where biases are expected to appear, and it conceptualizes social categories like race and gender as binary, ignoring the multifaceted nature of these identities. Using standardized facial images that vary in prototypicality, we test four VLMs for both trait associations and homogeneity bias in open-ended contexts. We find that VLMs consistently generate more uniform stories for women compared to men, with people who are more gender prototypical in appearance being represented more uniformly. By contrast, VLMs represent White Americans more uniformly than Black Americans. Unlike with gender prototypicality, race prototypicality was not related to stronger uniformity. In terms of trait associations, we find limited evidence of stereotyping-Black Americans were consistently linked with basketball across all models, while other racial associations (i.e., art, healthcare, appearance) varied by specific VLM. These findings demonstrate that VLM stereotyping manifests in ways that go beyond simple group membership, suggesting that conventional bias mitigation strategies may be insufficient to address VLM stereotyping and that homogeneity bias persists even when trait associations are less apparent in model outputs.
Related papers
- Vision-Language Models Represent Darker-Skinned Black Individuals as More Homogeneous than Lighter-Skinned Black Individuals [0.0]
Vision-Language Models (VLMs) combine Large Language Model (LLM) capabilities with image processing, enabling tasks like image captioning and text-to-image generation.<n>Skin tone bias, where darker-skinned individuals face more negative stereotyping than lighter-skinned individuals, is well-documented in the social sciences.<n>We sampled computer-generated images of Black American men and women, controlling for skin tone variations while keeping other features constant.
arXiv Detail & Related papers (2024-12-12T18:53:49Z) - GenderBias-\emph{VL}: Benchmarking Gender Bias in Vision Language Models via Counterfactual Probing [72.0343083866144]
This paper introduces the GenderBias-emphVL benchmark to evaluate occupation-related gender bias in Large Vision-Language Models.
Using our benchmark, we extensively evaluate 15 commonly used open-source LVLMs and state-of-the-art commercial APIs.
Our findings reveal widespread gender biases in existing LVLMs.
arXiv Detail & Related papers (2024-06-30T05:55:15Z) - The Factuality Tax of Diversity-Intervened Text-to-Image Generation: Benchmark and Fact-Augmented Intervention [61.80236015147771]
We quantify the trade-off between using diversity interventions and preserving demographic factuality in T2I models.
Experiments on DoFaiR reveal that diversity-oriented instructions increase the number of different gender and racial groups.
We propose Fact-Augmented Intervention (FAI) to reflect on verbalized or retrieved factual information about gender and racial compositions of generation subjects in history.
arXiv Detail & Related papers (2024-06-29T09:09:42Z) - More Distinctively Black and Feminine Faces Lead to Increased Stereotyping in Vision-Language Models [0.30723404270319693]
This study explores how Vision Language Models (VLMs) perpetuate homogeneity bias and trait associations with regards to race and gender.
VLMs may associate subtle visual cues related to racial and gender groups with stereotypes in ways that could be challenging to mitigate.
arXiv Detail & Related papers (2024-05-22T00:45:29Z) - Large Language Models Portray Socially Subordinate Groups as More Homogeneous, Consistent with a Bias Observed in Humans [0.30723404270319693]
We investigate a new form of bias in large language models (LLMs)
We find that ChatGPT portrayed African, Asian, and Hispanic Americans as more homogeneous than White Americans.
We argue that the tendency to describe groups as less diverse risks perpetuating stereotypes and discriminatory behavior.
arXiv Detail & Related papers (2024-01-16T16:52:00Z) - Social Bias Probing: Fairness Benchmarking for Language Models [38.180696489079985]
This paper proposes a novel framework for probing language models for social biases by assessing disparate treatment.
We curate SoFa, a large-scale benchmark designed to address the limitations of existing fairness collections.
We show that biases within language models are more nuanced than acknowledged, indicating a broader scope of encoded biases than previously recognized.
arXiv Detail & Related papers (2023-11-15T16:35:59Z) - Marked Personas: Using Natural Language Prompts to Measure Stereotypes
in Language Models [33.157279170602784]
We present Marked Personas, a prompt-based method to measure stereotypes in large language models (LLMs)
We find that portrayals generated by GPT-3.5 and GPT-4 contain higher rates of racial stereotypes than human-written portrayals using the same prompts.
An intersectional lens reveals tropes that dominate portrayals of marginalized groups, such as tropicalism and the hypersexualization of minoritized women.
arXiv Detail & Related papers (2023-05-29T16:29:22Z) - Easily Accessible Text-to-Image Generation Amplifies Demographic
Stereotypes at Large Scale [61.555788332182395]
We investigate the potential for machine learning models to amplify dangerous and complex stereotypes.
We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects.
arXiv Detail & Related papers (2022-11-07T18:31:07Z) - Fairness for Image Generation with Uncertain Sensitive Attributes [97.81354305427871]
This work tackles the issue of fairness in the context of generative procedures, such as image super-resolution.
While traditional group fairness definitions are typically defined with respect to specified protected groups, we emphasize that there are no ground truth identities.
We show that the natural extension of demographic parity is strongly dependent on the grouping, and emphimpossible to achieve obliviously.
arXiv Detail & Related papers (2021-06-23T06:17:17Z) - How True is GPT-2? An Empirical Analysis of Intersectional Occupational
Biases [50.591267188664666]
Downstream applications are at risk of inheriting biases contained in natural language models.
We analyze the occupational biases of a popular generative language model, GPT-2.
For a given job, GPT-2 reflects the societal skew of gender and ethnicity in the US, and in some cases, pulls the distribution towards gender parity.
arXiv Detail & Related papers (2021-02-08T11:10:27Z) - One Label, One Billion Faces: Usage and Consistency of Racial Categories
in Computer Vision [75.82110684355979]
We study the racial system encoded by computer vision datasets supplying categorical race labels for face images.
We find that each dataset encodes a substantially unique racial system, despite nominally equivalent racial categories.
We find evidence that racial categories encode stereotypes, and exclude ethnic groups from categories on the basis of nonconformity to stereotypes.
arXiv Detail & Related papers (2021-02-03T22:50:04Z) - CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked
Language Models [30.582132471411263]
We introduce the Crowd Stereotype Pairs benchmark (CrowS-Pairs)
CrowS-Pairs has 1508 examples that cover stereotypes dealing with nine types of bias, like race, religion, and age.
We find that all three of the widely-used sentences we evaluate substantially favor stereotypes in every category in CrowS-Pairs.
arXiv Detail & Related papers (2020-09-30T22:38:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.