How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias
- URL: http://arxiv.org/abs/2501.09014v1
- Date: Wed, 15 Jan 2025 18:57:17 GMT
- Title: How Do Generative Models Draw a Software Engineer? A Case Study on Stable Diffusion Bias
- Authors: Tosin Fadahunsi, Giordano d'Aloisio, Antinisca Di Marco, Federica Sarro,
- Abstract summary: We evaluate the gender and ethnicity bias exposed by three versions of the Stable Diffusion model towards software engineering tasks.
Results show how all models are significantly biased towards male figures when representing software engineers.
However, all models significantly under-represent Black and Arab figures, regardless of the prompt style used.
- Score: 9.574645433491225
- License:
- Abstract: Generative models are nowadays widely used to generate graphical content used for multiple purposes, e.g. web, art, advertisement. However, it has been shown that the images generated by these models could reinforce societal biases already existing in specific contexts. In this paper, we focus on understanding if this is the case when one generates images related to various software engineering tasks. In fact, the Software Engineering (SE) community is not immune from gender and ethnicity disparities, which could be amplified by the use of these models. Hence, if used without consciousness, artificially generated images could reinforce these biases in the SE domain. Specifically, we perform an extensive empirical evaluation of the gender and ethnicity bias exposed by three versions of the Stable Diffusion (SD) model (a very popular open-source text-to-image model) - SD 2, SD XL, and SD 3 - towards SE tasks. We obtain 6,720 images by feeding each model with two sets of prompts describing different software-related tasks: one set includes the Software Engineer keyword, and one set does not include any specification of the person performing the task. Next, we evaluate the gender and ethnicity disparities in the generated images. Results show how all models are significantly biased towards male figures when representing software engineers. On the contrary, while SD 2 and SD XL are strongly biased towards White figures, SD 3 is slightly more biased towards Asian figures. Nevertheless, all models significantly under-represent Black and Arab figures, regardless of the prompt style used. The results of our analysis highlight severe concerns about adopting those models to generate content for SE tasks and open the field for future research on bias mitigation in this context.
Related papers
- Fact-or-Fair: A Checklist for Behavioral Testing of AI Models on Fairness-Related Queries [85.909363478929]
In this study, we focus on 19 real-world statistics collected from authoritative sources.
We develop a checklist comprising objective and subjective queries to analyze behavior of large language models.
We propose metrics to assess factuality and fairness, and formally prove the inherent trade-off between these two aspects.
arXiv Detail & Related papers (2025-02-09T10:54:11Z) - Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models? [11.101062595569854]
Previous studies have shown that Text-to-Image (T2I) models can perpetuate or even amplify gender stereotypes when provided with neutral text prompts.
No existing work comprehensively compares the various detectors and understands how the gender bias detected by them deviates from the actual situation.
This study addresses this gap by validating previous gender bias detectors using a manually labeled dataset and comparing how the bias identified by various detectors deviates from the actual bias in T2I models.
arXiv Detail & Related papers (2025-01-27T04:47:19Z) - VersusDebias: Universal Zero-Shot Debiasing for Text-to-Image Models via SLM-Based Prompt Engineering and Generative Adversary [8.24274551090375]
We introduce VersusDebias, a novel and universal debiasing framework for biases in arbitrary Text-to-Image (T2I) models.
The self-adaptive module generates specialized attribute arrays to post-process hallucinations and debias multiple attributes simultaneously.
In both zero-shot and few-shot scenarios, VersusDebias outperforms existing methods, showcasing its exceptional utility.
arXiv Detail & Related papers (2024-07-28T16:24:07Z) - MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias [23.10522891268232]
We introduce a Mixture-of-Experts approach to mitigate gender bias in text-to-image models.
We show that our approach successfully mitigates gender bias while maintaining image quality.
arXiv Detail & Related papers (2024-06-25T14:59:31Z) - The Male CEO and the Female Assistant: Evaluation and Mitigation of Gender Biases in Text-To-Image Generation of Dual Subjects [58.27353205269664]
We propose the Paired Stereotype Test (PST) framework, which queries T2I models to depict two individuals assigned with male-stereotyped and female-stereotyped social identities.
PST queries T2I models to depict two individuals assigned with male-stereotyped and female-stereotyped social identities.
Using PST, we evaluate two aspects of gender biases -- the well-known bias in gendered occupation and a novel aspect: bias in organizational power.
arXiv Detail & Related papers (2024-02-16T21:32:27Z) - New Job, New Gender? Measuring the Social Bias in Image Generation Models [85.26441602999014]
Image generation models are susceptible to generating content that perpetuates social stereotypes and biases.
We propose BiasPainter, a framework that can accurately, automatically and comprehensively trigger social bias in image generation models.
BiasPainter can achieve 90.8% accuracy on automatic bias detection, which is significantly higher than the results reported in previous work.
arXiv Detail & Related papers (2024-01-01T14:06:55Z) - Quantifying Bias in Text-to-Image Generative Models [49.60774626839712]
Bias in text-to-image (T2I) models can propagate unfair social representations and may be used to aggressively market ideas or push controversial agendas.
Existing T2I model bias evaluation methods only focus on social biases.
We propose an evaluation methodology to quantify general biases in T2I generative models, without any preconceived notions.
arXiv Detail & Related papers (2023-12-20T14:26:54Z) - Social Biases through the Text-to-Image Generation Lens [9.137275391251517]
Text-to-Image (T2I) generation is enabling new applications that support creators, designers, and general end users of productivity software.
We take a multi-dimensional approach to studying and quantifying common social biases as reflected in the generated images.
We present findings for two popular T2I models: DALLE-v2 and Stable Diffusion.
arXiv Detail & Related papers (2023-03-30T05:29:13Z) - Stable Bias: Analyzing Societal Representations in Diffusion Models [72.27121528451528]
We propose a new method for exploring the social biases in Text-to-Image (TTI) systems.
Our approach relies on characterizing the variation in generated images triggered by enumerating gender and ethnicity markers in the prompts.
We leverage this method to analyze images generated by 3 popular TTI systems and find that while all of their outputs show correlations with US labor demographics, they also consistently under-represent marginalized identities to different extents.
arXiv Detail & Related papers (2023-03-20T19:32:49Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.