FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing
- URL: http://arxiv.org/abs/2502.03826v1
- Date: Thu, 06 Feb 2025 07:22:57 GMT
- Title: FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing
- Authors: Jinya Sakurai, Issei Sato,
- Abstract summary: We introduce FairT2I, a novel framework that harnesses large language models to detect and mitigate social biases in T2I generation.
Our results show that FairT2I successfully mitigates social biases and enhances the diversity of sensitive attributes in generated images.
- Score: 32.01426831450348
- License:
- Abstract: The proliferation of Text-to-Image (T2I) models has revolutionized content creation, providing powerful tools for diverse applications ranging from artistic expression to educational material development and marketing. Despite these technological advancements, significant ethical concerns arise from these models' reliance on large-scale datasets that often contain inherent societal biases. These biases are further amplified when AI-generated content is included in training data, potentially reinforcing and perpetuating stereotypes in the generated outputs. In this paper, we introduce FairT2I, a novel framework that harnesses large language models to detect and mitigate social biases in T2I generation. Our framework comprises two key components: (1) an LLM-based bias detection module that identifies potential social biases in generated images based on text prompts, and (2) an attribute rebalancing module that fine-tunes sensitive attributes within the T2I model to mitigate identified biases. Our extensive experiments across various T2I models and datasets show that FairT2I can significantly reduce bias while maintaining high-quality image generation. We conducted both qualitative user studies and quantitative non-parametric analyses in the generated image feature space, building upon the occupational dataset introduced in the Stable Bias study. Our results show that FairT2I successfully mitigates social biases and enhances the diversity of sensitive attributes in generated images. We further demonstrate, using the P2 dataset, that our framework can detect subtle biases that are challenging for human observers to perceive, extending beyond occupation-related prompts. On the basis of these findings, we introduce a new benchmark dataset for evaluating bias in T2I models.
Related papers
- Text-to-Image Synthesis: A Decade Survey [7.250878248686215]
Text-to-image synthesis (T2I) focuses on generating high-quality images from textual descriptions.
In this survey, we review over 440 recent works on T2I.
arXiv Detail & Related papers (2024-11-25T07:40:32Z) - Image Regeneration: Evaluating Text-to-Image Model via Generating Identical Image with Multimodal Large Language Models [54.052963634384945]
We introduce the Image Regeneration task to assess text-to-image models.
We use GPT4V to bridge the gap between the reference image and the text input for the T2I model.
We also present ImageRepainter framework to enhance the quality of generated images.
arXiv Detail & Related papers (2024-11-14T13:52:43Z) - Evaluating the Generation of Spatial Relations in Text and Image Generative Models [4.281091463408283]
spatial relations are naturally understood in a visuo-spatial manner.
We develop an approach to convert LLM outputs into an image, thereby allowing us to evaluate both T2I models and LLMs.
Surprisingly, we found that T2I models only achieve subpar performance despite their impressive general image-generation abilities.
arXiv Detail & Related papers (2024-11-12T09:30:02Z) - Minority-Focused Text-to-Image Generation via Prompt Optimization [57.319845580050924]
We investigate the generation of minority samples using pretrained text-to-image (T2I) latent diffusion models.
We develop an online prompt optimization framework that can encourage the emergence of desired properties.
We then tailor this generic prompt into a specialized solver that promotes the generation of minority features.
arXiv Detail & Related papers (2024-10-10T11:56:09Z) - BIGbench: A Unified Benchmark for Social Bias in Text-to-Image Generative Models Based on Multi-modal LLM [8.24274551090375]
We introduce BIGbench, a unified benchmark for Biases of Image Generation.
Unlike existing benchmarks, BIGbench classifies and evaluates biases across four dimensions.
Our study also reveal new research directions about biases, such as the effect of distillation and irrelevant protected attributes.
arXiv Detail & Related papers (2024-07-21T18:09:40Z) - PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models [50.33699462106502]
Text-to-image (T2I) models frequently fail to produce images consistent with physical commonsense.
Current T2I evaluation benchmarks focus on metrics such as accuracy, bias, and safety, neglecting the evaluation of models' internal knowledge.
We introduce PhyBench, a comprehensive T2I evaluation dataset comprising 700 prompts across 4 primary categories: mechanics, optics, thermodynamics, and material properties.
arXiv Detail & Related papers (2024-06-17T17:49:01Z) - FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models [7.30796695035169]
FAIntbench is a holistic and precise benchmark for biases in Text-to-Image (T2I) models.
We applied FAIntbench to evaluate seven recent large-scale T2I models and conducted human evaluation.
Results demonstrated the effectiveness of FAIntbench in identifying various biases.
arXiv Detail & Related papers (2024-05-28T04:18:00Z) - Survey of Bias In Text-to-Image Generation: Definition, Evaluation, and Mitigation [47.770531682802314]
Even simple prompts could cause T2I models to exhibit conspicuous social bias in generated images.
We present the first extensive survey on bias in T2I generative models.
We discuss how these works define, evaluate, and mitigate different aspects of bias.
arXiv Detail & Related papers (2024-04-01T10:19:05Z) - SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with
Auto-Generated Data [73.23388142296535]
SELMA improves the faithfulness of T2I models by fine-tuning models on automatically generated, multi-skill image-text datasets.
We show that SELMA significantly improves the semantic alignment and text faithfulness of state-of-the-art T2I diffusion models on multiple benchmarks.
We also show that fine-tuning with image-text pairs auto-collected via SELMA shows comparable performance to fine-tuning with ground truth data.
arXiv Detail & Related papers (2024-03-11T17:35:33Z) - Benchmarking Spatial Relationships in Text-to-Image Generation [102.62422723894232]
We investigate the ability of text-to-image models to generate correct spatial relationships among objects.
We present VISOR, an evaluation metric that captures how accurately the spatial relationship described in text is generated in the image.
Our experiments reveal a surprising finding that, although state-of-the-art T2I models exhibit high image quality, they are severely limited in their ability to generate multiple objects or the specified spatial relations between them.
arXiv Detail & Related papers (2022-12-20T06:03:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.