Related papers: Hidden Bias in the Machine: Stereotypes in Text-to-Image Models

Hidden Bias in the Machine: Stereotypes in Text-to-Image Models

URL: http://arxiv.org/abs/2506.13780v1
Date: Mon, 09 Jun 2025 23:06:04 GMT
Title: Hidden Bias in the Machine: Stereotypes in Text-to-Image Models
Authors: Sedat Porikli, Vedat Porikli,
Abstract summary: Text-to-Image (T2I) models have transformed visual content creation, producing highly realistic images from natural language prompts.<n>We curated a diverse set of prompts spanning thematic categories such as occupations, traits, actions, ideologies, emotions, family roles, place descriptions, spirituality, and life events.<n>For each of the 160 unique topics, we crafted multiple prompt variations to reflect a wide range of meanings and perspectives.<n>Our analysis reveals significant disparities in the representation of gender, race, age, somatotype, and other human-centric factors across generated images.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-Image (T2I) models have transformed visual content creation, producing highly realistic images from natural language prompts. However, concerns persist around their potential to replicate and magnify existing societal biases. To investigate these issues, we curated a diverse set of prompts spanning thematic categories such as occupations, traits, actions, ideologies, emotions, family roles, place descriptions, spirituality, and life events. For each of the 160 unique topics, we crafted multiple prompt variations to reflect a wide range of meanings and perspectives. Using Stable Diffusion 1.5 (UNet-based) and Flux-1 (DiT-based) models with original checkpoints, we generated over 16,000 images under consistent settings. Additionally, we collected 8,000 comparison images from Google Image Search. All outputs were filtered to exclude abstract, distorted, or nonsensical results. Our analysis reveals significant disparities in the representation of gender, race, age, somatotype, and other human-centric factors across generated images. These disparities often mirror and reinforce harmful stereotypes embedded in societal narratives. We discuss the implications of these findings and emphasize the need for more inclusive datasets and development practices to foster fairness in generative visual systems.

Related papers

When Cars Have Stereotypes: Auditing Demographic Bias in Objects from Text-to-Image Models [4.240144901142787]
We introduce SODA (Stereotyped Object Diagnostic Audit), a novel framework for measuring such biases.<n>Our approach compares visual attributes of objects generated with demographic cues to those from neutral prompts.<n>We uncover strong associations between specific demographic groups and visual attributes, such as recurring color patterns prompted by gender or ethnicity cues.
arXiv Detail & Related papers (2025-08-05T14:15:53Z)
A Large Scale Analysis of Gender Biases in Text-to-Image Generative Models [45.55471356313678]
This paper presents the first large-scale study on gender bias in text-to-image (T2I) models.<n>We create a dataset of 3,217 gender-neutral prompts and generate 200 images per prompt from five leading T2I models.<n>We automatically detect the perceived gender of people in the generated images and filter out images with no person or multiple people of different genders.
arXiv Detail & Related papers (2025-03-30T11:11:51Z)
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images [5.529078451095096]
understanding the semantics of visual scenes is a fundamental challenge in Computer Vision. Recent advancements in text-to-image frameworks have led to models that implicitly capture natural scene statistics. Our work presents StableSemantics, a dataset comprising 224 thousand human-curated prompts, processed natural language captions, over 2 million synthetic images, and 10 million attention maps corresponding to individual noun chunks.
arXiv Detail & Related papers (2024-06-19T17:59:40Z)
Evaluating Vision-Language Models on Bistable Images [34.492117496933915]
This study is the most extensive examination of vision-language models using bistable images to date. We manually gathered a dataset of 29 bistable images, along with their associated labels, and subjected them to 116 different manipulations in brightness, tint, and rotation. Our findings reveal that, with the exception of models from the Idefics family and LLaVA1.5-13b, there is a pronounced preference for one interpretation over another.
arXiv Detail & Related papers (2024-05-29T18:04:59Z)
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation [60.943159830780154]
We introduce Bounded Attention, a training-free method for bounding the information flow in the sampling process. We demonstrate that our method empowers the generation of multiple subjects that better align with given prompts and layouts.
arXiv Detail & Related papers (2024-03-25T17:52:07Z)
Stable Diffusion Exposed: Gender Bias from Prompt to Image [25.702257177921048]
This paper introduces an evaluation protocol that analyzes the impact of gender indicators at every step of the generation process on Stable Diffusion images. Our findings include the existence of differences in the depiction of objects, such as instruments tailored for specific genders, and shifts in overall layouts.
arXiv Detail & Related papers (2023-12-05T10:12:59Z)
Word-Level Explanations for Analyzing Bias in Text-to-Image Models [72.71184730702086]
Text-to-image (T2I) models can generate images that underrepresent minorities based on race and sex. This paper investigates which word in the input prompt is responsible for bias in generated images.
arXiv Detail & Related papers (2023-06-03T21:39:07Z)
Social Biases through the Text-to-Image Generation Lens [9.137275391251517]
Text-to-Image (T2I) generation is enabling new applications that support creators, designers, and general end users of productivity software. We take a multi-dimensional approach to studying and quantifying common social biases as reflected in the generated images. We present findings for two popular T2I models: DALLE-v2 and Stable Diffusion.
arXiv Detail & Related papers (2023-03-30T05:29:13Z)
Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems. Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts. We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z)
Discovering and Mitigating Visual Biases through Keyword Explanation [66.71792624377069]
We propose the Bias-to-Text (B2T) framework, which interprets visual biases as keywords. B2T can identify known biases, such as gender bias in CelebA, background bias in Waterbirds, and distribution shifts in ImageNet-R/C. B2T uncovers novel biases in larger datasets, such as Dollar Street and ImageNet.
arXiv Detail & Related papers (2023-01-26T13:58:46Z)
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale [61.555788332182395]
We investigate the potential for machine learning models to amplify dangerous and complex stereotypes. We find a broad range of ordinary prompts produce stereotypes, including prompts simply mentioning traits, descriptors, occupations, or objects.
arXiv Detail & Related papers (2022-11-07T18:31:07Z)
How well can Text-to-Image Generative Models understand Ethical Natural Language Interventions? [67.97752431429865]
We study the effect on the diversity of the generated images when adding ethical intervention. Preliminary studies indicate that a large change in the model predictions is triggered by certain phrases such as 'irrespective of gender'
arXiv Detail & Related papers (2022-10-27T07:32:39Z)
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models. First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding. Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.