Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder
- URL: http://arxiv.org/abs/2507.20973v1
- Date: Mon, 28 Jul 2025 16:36:13 GMT
- Title: Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder
- Authors: Chao Wu, Zhenyi Wang, Kangxian Xie, Naresh Kumar Devulapally, Vishnu Suresh Lokhande, Mingchen Gao,
- Abstract summary: Text-to-image (T2I) diffusion models often exhibit gender bias, particularly by generating stereotypical associations between professions and gendered subjects.<n>This paper presents SAE Debias, a model-agnostic framework for mitigating such bias in T2I generation.<n>To the best of our knowledge, this is the first work to apply sparse autoencoders for identifying and intervening in gender bias within T2I models.
- Score: 14.164976259534143
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-image (T2I) diffusion models often exhibit gender bias, particularly by generating stereotypical associations between professions and gendered subjects. This paper presents SAE Debias, a lightweight and model-agnostic framework for mitigating such bias in T2I generation. Unlike prior approaches that rely on CLIP-based filtering or prompt engineering, which often require model-specific adjustments and offer limited control, SAE Debias operates directly within the feature space without retraining or architectural modifications. By leveraging a k-sparse autoencoder pre-trained on a gender bias dataset, the method identifies gender-relevant directions within the sparse latent space, capturing professional stereotypes. Specifically, a biased direction per profession is constructed from sparse latents and suppressed during inference to steer generations toward more gender-balanced outputs. Trained only once, the sparse autoencoder provides a reusable debiasing direction, offering effective control and interpretable insight into biased subspaces. Extensive evaluations across multiple T2I models, including Stable Diffusion 1.4, 1.5, 2.1, and SDXL, demonstrate that SAE Debias substantially reduces gender bias while preserving generation quality. To the best of our knowledge, this is the first work to apply sparse autoencoders for identifying and intervening in gender bias within T2I models. These findings contribute toward building socially responsible generative AI, providing an interpretable and model-agnostic tool to support fairness in text-to-image generation.
Related papers
- AutoDebias: Automated Framework for Debiasing Text-to-Image Models [6.581606189725493]
Text-to-Image (T2I) models generate high-quality images from text prompts but often exhibit unintended social biases.<n>We propose AutoDebias, a framework that automatically identifies and mitigates harmful biases in T2I models without prior knowledge of specific bias types.<n>We evaluate the framework on a benchmark covering over 25 bias scenarios, including challenging cases where multiple biases occur simultaneously.
arXiv Detail & Related papers (2025-08-01T09:05:45Z) - FairGen: Controlling Sensitive Attributes for Fair Generations in Diffusion Models via Adaptive Latent Guidance [19.65226469682089]
Text-to-image diffusion models often exhibit biases toward specific demographic groups.<n>In this paper, we tackle the challenge of mitigating generation bias towards any target attribute value.<n>We propose FairGen, an adaptive latent guidance mechanism which controls the generation distribution during inference.
arXiv Detail & Related papers (2025-02-25T23:47:22Z) - Do Existing Testing Tools Really Uncover Gender Bias in Text-to-Image Models? [11.101062595569854]
Previous studies have shown that Text-to-Image (T2I) models can perpetuate or even amplify gender stereotypes when provided with neutral text prompts.<n>No existing work comprehensively compares the various detectors and understands how the gender bias detected by them deviates from the actual situation.<n>This study addresses this gap by validating previous gender bias detectors using a manually labeled dataset and comparing how the bias identified by various detectors deviates from the actual bias in T2I models.
arXiv Detail & Related papers (2025-01-27T04:47:19Z) - The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models [91.86718720024825]
We center transgender, nonbinary, and other gender-diverse identities to investigate how alignment procedures interact with pre-existing gender-diverse bias.<n>Our findings reveal that DPO-aligned models are particularly sensitive to supervised finetuning.<n>We conclude with recommendations tailored to DPO and broader alignment practices.
arXiv Detail & Related papers (2024-11-06T06:50:50Z) - MoESD: Mixture of Experts Stable Diffusion to Mitigate Gender Bias [23.10522891268232]
We introduce a Mixture-of-Experts approach to mitigate gender bias in text-to-image models.
We show that our approach successfully mitigates gender bias while maintaining image quality.
arXiv Detail & Related papers (2024-06-25T14:59:31Z) - Manipulating and Mitigating Generative Model Biases without Retraining [49.60774626839712]
We propose a dynamic and computationally efficient manipulation of T2I model biases by exploiting their rich language embedding spaces without model retraining.
We show that leveraging foundational vector algebra allows for a convenient control over language model embeddings to shift T2I model outputs.
As a by-product, this control serves as a form of precise prompt engineering to generate images which are generally implausible using regular text prompts.
arXiv Detail & Related papers (2024-04-03T07:33:30Z) - The Male CEO and the Female Assistant: Evaluation and Mitigation of Gender Biases in Text-To-Image Generation of Dual Subjects [58.27353205269664]
We propose the Paired Stereotype Test (PST) framework, which queries T2I models to depict two individuals assigned with male-stereotyped and female-stereotyped social identities.<n>PST queries T2I models to depict two individuals assigned with male-stereotyped and female-stereotyped social identities.<n>Using PST, we evaluate two aspects of gender biases -- the well-known bias in gendered occupation and a novel aspect: bias in organizational power.
arXiv Detail & Related papers (2024-02-16T21:32:27Z) - Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding.
We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z) - Self-Conditioned Generative Adversarial Networks for Image Editing [61.50205580051405]
Generative Adversarial Networks (GANs) are susceptible to bias, learned from either the unbalanced data, or through mode collapse.
We argue that this bias is responsible not only for fairness concerns, but that it plays a key role in the collapse of latent-traversal editing methods when deviating away from the distribution's core.
arXiv Detail & Related papers (2022-02-08T18:08:24Z) - Mitigating Gender Bias in Captioning Systems [56.25457065032423]
Most captioning models learn gender bias, leading to high gender prediction errors, especially for women.
We propose a new Guided Attention Image Captioning model (GAIC) which provides self-guidance on visual attention to encourage the model to capture correct gender visual evidence.
arXiv Detail & Related papers (2020-06-15T12:16:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.