New Job, New Gender? Measuring the Social Bias in Image Generation
Models
- URL: http://arxiv.org/abs/2401.00763v1
- Date: Mon, 1 Jan 2024 14:06:55 GMT
- Title: New Job, New Gender? Measuring the Social Bias in Image Generation
Models
- Authors: Wenxuan Wang, Haonan Bai, Jen-tse Huang, Yuxuan Wan, Youliang Yuan,
Haoyi Qiu, Nanyun Peng, Michael R. Lyu
- Abstract summary: Image generation models can generate or edit images from a given text.
Recent advancements in image generation technology, exemplified by DALL-E and Midjourney, have been groundbreaking.
These advanced models are often trained on massive Internet datasets, making them susceptible to generating content that perpetuates social stereotypes and biases.
We propose BiasPainter, a novel testing framework that can accurately, automatically and comprehensively trigger social bias in image generation models.
- Score: 88.93677200602887
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Image generation models can generate or edit images from a given text. Recent
advancements in image generation technology, exemplified by DALL-E and
Midjourney, have been groundbreaking. These advanced models, despite their
impressive capabilities, are often trained on massive Internet datasets, making
them susceptible to generating content that perpetuates social stereotypes and
biases, which can lead to severe consequences. Prior research on assessing bias
within image generation models suffers from several shortcomings, including
limited accuracy, reliance on extensive human labor, and lack of comprehensive
analysis. In this paper, we propose BiasPainter, a novel metamorphic testing
framework that can accurately, automatically and comprehensively trigger social
bias in image generation models. BiasPainter uses a diverse range of seed
images of individuals and prompts the image generation models to edit these
images using gender, race, and age-neutral queries. These queries span 62
professions, 39 activities, 57 types of objects, and 70 personality traits. The
framework then compares the edited images to the original seed images, focusing
on any changes related to gender, race, and age. BiasPainter adopts a testing
oracle that these characteristics should not be modified when subjected to
neutral prompts. Built upon this design, BiasPainter can trigger the social
bias and evaluate the fairness of image generation models. To evaluate the
effectiveness of BiasPainter, we use BiasPainter to test five widely-used
commercial image generation software and models, such as stable diffusion and
Midjourney. Experimental results show that 100\% of the generated test cases
can successfully trigger social bias in image generation models.
Related papers
- Gender Bias Evaluation in Text-to-image Generation: A Survey [25.702257177921048]
We review recent work on gender bias evaluation in text-to-image generation.
We focus on the evaluation of recent popular models such as Stable Diffusion and DALL-E 2.
arXiv Detail & Related papers (2024-08-21T06:01:23Z) - Would Deep Generative Models Amplify Bias in Future Models? [29.918422914275226]
We investigate the impact of deep generative models on potential social biases in upcoming computer vision models.
We conduct simulations by substituting original images in COCO and CC3M datasets with images generated through Stable Diffusion.
Contrary to expectations, our findings indicate that introducing generated images during training does not uniformly amplify bias.
arXiv Detail & Related papers (2024-04-04T06:58:39Z) - Quantifying Bias in Text-to-Image Generative Models [49.60774626839712]
Bias in text-to-image (T2I) models can propagate unfair social representations and may be used to aggressively market ideas or push controversial agendas.
Existing T2I model bias evaluation methods only focus on social biases.
We propose an evaluation methodology to quantify general biases in T2I generative models, without any preconceived notions.
arXiv Detail & Related papers (2023-12-20T14:26:54Z) - ImagenHub: Standardizing the evaluation of conditional image generation
models [48.51117156168]
This paper proposes ImagenHub, which is a one-stop library to standardize the inference and evaluation of all conditional image generation models.
We design two human evaluation scores, i.e. Semantic Consistency and Perceptual Quality, along with comprehensive guidelines to evaluate generated images.
Our human evaluation achieves a high inter-worker agreement of Krippendorff's alpha on 76% models with a value higher than 0.4.
arXiv Detail & Related papers (2023-10-02T19:41:42Z) - Social Biases through the Text-to-Image Generation Lens [9.137275391251517]
Text-to-Image (T2I) generation is enabling new applications that support creators, designers, and general end users of productivity software.
We take a multi-dimensional approach to studying and quantifying common social biases as reflected in the generated images.
We present findings for two popular T2I models: DALLE-v2 and Stable Diffusion.
arXiv Detail & Related papers (2023-03-30T05:29:13Z) - How well can Text-to-Image Generative Models understand Ethical Natural
Language Interventions? [67.97752431429865]
We study the effect on the diversity of the generated images when adding ethical intervention.
Preliminary studies indicate that a large change in the model predictions is triggered by certain phrases such as 'irrespective of gender'
arXiv Detail & Related papers (2022-10-27T07:32:39Z) - DALL-Eval: Probing the Reasoning Skills and Social Biases of
Text-to-Image Generation Models [73.12069620086311]
We investigate the visual reasoning capabilities and social biases of text-to-image models.
First, we measure three visual reasoning skills: object recognition, object counting, and spatial relation understanding.
Second, we assess the gender and skin tone biases by measuring the gender/skin tone distribution of generated images.
arXiv Detail & Related papers (2022-02-08T18:36:52Z) - Unravelling the Effect of Image Distortions for Biased Prediction of
Pre-trained Face Recognition Models [86.79402670904338]
We evaluate the performance of four state-of-the-art deep face recognition models in the presence of image distortions.
We have observed that image distortions have a relationship with the performance gap of the model across different subgroups.
arXiv Detail & Related papers (2021-08-14T16:49:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.