Related papers: AutoDebias: Automated Framework for Debiasing Text-to-Image Models

AutoDebias: Automated Framework for Debiasing Text-to-Image Models

URL: http://arxiv.org/abs/2508.00445v1
Date: Fri, 01 Aug 2025 09:05:45 GMT
Title: AutoDebias: Automated Framework for Debiasing Text-to-Image Models
Authors: Hongyi Cai, Mohammad Mahdinur Rahman, Mingkang Dong, Jie Li, Muxin Pu, Zhili Fang, Yinan Peng, Hanjun Luo, Yang Liu,
Abstract summary: Text-to-Image (T2I) models generate high-quality images from text prompts but often exhibit unintended social biases.<n>We propose AutoDebias, a framework that automatically identifies and mitigates harmful biases in T2I models without prior knowledge of specific bias types.<n>We evaluate the framework on a benchmark covering over 25 bias scenarios, including challenging cases where multiple biases occur simultaneously.
Score: 6.581606189725493
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Text-to-Image (T2I) models generate high-quality images from text prompts but often exhibit unintended social biases, such as gender or racial stereotypes, even when these attributes are not mentioned. Existing debiasing methods work well for simple or well-known cases but struggle with subtle or overlapping biases. We propose AutoDebias, a framework that automatically identifies and mitigates harmful biases in T2I models without prior knowledge of specific bias types. Specifically, AutoDebias leverages vision-language models to detect biased visual patterns and constructs fairness guides by generating inclusive alternative prompts that reflect balanced representations. These guides drive a CLIP-guided training process that promotes fairer outputs while preserving the original model's image quality and diversity. Unlike existing methods, AutoDebias effectively addresses both subtle stereotypes and multiple interacting biases. We evaluate the framework on a benchmark covering over 25 bias scenarios, including challenging cases where multiple biases occur simultaneously. AutoDebias detects harmful patterns with 91.6% accuracy and reduces biased outputs from 90% to negligible levels, while preserving the visual fidelity of the original model.

Related papers

Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder [14.164976259534143]
Text-to-image (T2I) diffusion models often exhibit gender bias, particularly by generating stereotypical associations between professions and gendered subjects.<n>This paper presents SAE Debias, a model-agnostic framework for mitigating such bias in T2I generation.<n>To the best of our knowledge, this is the first work to apply sparse autoencoders for identifying and intervening in gender bias within T2I models.
arXiv Detail & Related papers (2025-07-28T16:36:13Z)
From Lists to Emojis: How Format Bias Affects Model Alignment [67.08430328350327]
We study format biases in reinforcement learning from human feedback.<n>Many widely-used preference models, including human evaluators, exhibit strong biases towards specific format patterns.<n>We show that with a small amount of biased data, we can inject significant bias into the reward model.
arXiv Detail & Related papers (2024-09-18T05:13:18Z)
GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models [75.04426753720553]
We propose a framework to identify, quantify, and explain biases in an open set setting. This pipeline leverages a Large Language Model (LLM) to propose biases starting from a set of captions. We show two variations of this framework: OpenBias and GradBias.
arXiv Detail & Related papers (2024-08-29T16:51:07Z)
VersusDebias: Universal Zero-Shot Debiasing for Text-to-Image Models via SLM-Based Prompt Engineering and Generative Adversary [8.24274551090375]
We introduce VersusDebias, a novel and universal debiasing framework for biases in arbitrary Text-to-Image (T2I) models. The self-adaptive module generates specialized attribute arrays to post-process hallucinations and debias multiple attributes simultaneously. In both zero-shot and few-shot scenarios, VersusDebias outperforms existing methods, showcasing its exceptional utility.
arXiv Detail & Related papers (2024-07-28T16:24:07Z)
Quantifying Bias in Text-to-Image Generative Models [49.60774626839712]
Bias in text-to-image (T2I) models can propagate unfair social representations and may be used to aggressively market ideas or push controversial agendas. Existing T2I model bias evaluation methods only focus on social biases. We propose an evaluation methodology to quantify general biases in T2I generative models, without any preconceived notions.
arXiv Detail & Related papers (2023-12-20T14:26:54Z)
TIBET: Identifying and Evaluating Biases in Text-to-Image Generative Models [22.076898042211305]
We propose a general approach to study and quantify a broad spectrum of biases, for any TTI model and for any prompt. Our approach automatically identifies potential biases that might be relevant to the given prompt, and measures those biases. We show that our method is uniquely capable of explaining complex multi-dimensional biases through semantic concepts.
arXiv Detail & Related papers (2023-12-03T02:31:37Z)
Balancing the Picture: Debiasing Vision-Language Datasets with Synthetic Contrast Sets [52.77024349608834]
Vision-language models can perpetuate and amplify societal biases learned during pre-training on uncurated image-text pairs from the internet. COCO Captions is the most commonly used dataset for evaluating bias between background context and the gender of people in-situ. We propose a novel dataset debiasing pipeline to augment the COCO dataset with synthetic, gender-balanced contrast sets.
arXiv Detail & Related papers (2023-05-24T17:59:18Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
Discovering and Mitigating Visual Biases through Keyword Explanation [66.71792624377069]
We propose the Bias-to-Text (B2T) framework, which interprets visual biases as keywords. B2T can identify known biases, such as gender bias in CelebA, background bias in Waterbirds, and distribution shifts in ImageNet-R/C. B2T uncovers novel biases in larger datasets, such as Dollar Street and ImageNet.
arXiv Detail & Related papers (2023-01-26T13:58:46Z)
Learning Debiased Models with Dynamic Gradient Alignment and Bias-conflicting Sample Mining [39.00256193731365]
Deep neural networks notoriously suffer from dataset biases which are detrimental to model robustness, generalization and fairness. We propose a two-stage debiasing scheme to combat against the intractable unknown biases.
arXiv Detail & Related papers (2021-11-25T14:50:10Z)
BiaSwap: Removing dataset bias with bias-tailored swapping augmentation [20.149645246997668]
Deep neural networks often make decisions based on the spurious correlations inherent in the dataset, failing to generalize in an unbiased data distribution. This paper proposes a novel bias-tailored augmentation-based approach, BiaSwap, for learning debiased representation without requiring supervision on the bias type.
arXiv Detail & Related papers (2021-08-23T08:35:26Z)
Towards Robustifying NLI Models Against Lexical Dataset Biases [94.79704960296108]
This paper explores both data-level and model-level debiasing methods to robustify models against lexical dataset biases. First, we debias the dataset through data augmentation and enhancement, but show that the model bias cannot be fully removed via this method. The second approach employs a bag-of-words sub-model to capture the features that are likely to exploit the bias and prevents the original model from learning these biased features.
arXiv Detail & Related papers (2020-05-10T17:56:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.