Related papers: Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search

Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search

URL: http://arxiv.org/abs/2512.08724v1
Date: Tue, 09 Dec 2025 15:39:04 GMT
Title: Exposing Hidden Biases in Text-to-Image Models via Automated Prompt Search
Authors: Manos Plitsis, Giorgos Bouritsas, Vassilis Katsouros, Yannis Panagakis,
Abstract summary: Bias-Guided Prompt Search (BGPS) is a framework that automatically generates prompts that aim to maximize the presence of biases in images.<n>BGPS comprises two components: (1) an LLM instructed to produce attribute-neutral prompts and (2) attribute classifiers acting on the TTI's internal representations.<n>We conduct extensive experiments on Stable Diffusion 1.5 and a state-of-the-art debiased model and discover an array of subtle and previously undocumented biases.
Score: 18.386455479798574
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Text-to-image (TTI) diffusion models have achieved remarkable visual quality, yet they have been repeatedly shown to exhibit social biases across sensitive attributes such as gender, race and age. To mitigate these biases, existing approaches frequently depend on curated prompt datasets - either manually constructed or generated with large language models (LLMs) - as part of their training and/or evaluation procedures. Beside the curation cost, this also risks overlooking unanticipated, less obvious prompts that trigger biased generation, even in models that have undergone debiasing. In this work, we introduce Bias-Guided Prompt Search (BGPS), a framework that automatically generates prompts that aim to maximize the presence of biases in the resulting images. BGPS comprises two components: (1) an LLM instructed to produce attribute-neutral prompts and (2) attribute classifiers acting on the TTI's internal representations that steer the decoding process of the LLM toward regions of the prompt space that amplify the image attributes of interest. We conduct extensive experiments on Stable Diffusion 1.5 and a state-of-the-art debiased model and discover an array of subtle and previously undocumented biases that severely deteriorate fairness metrics. Crucially, the discovered prompts are interpretable, i.e they may be entered by a typical user, quantitatively improving the perplexity metric compared to a prominent hard prompt optimization counterpart. Our findings uncover TTI vulnerabilities, while BGPS expands the bias search space and can act as a new evaluation tool for bias mitigation.

Related papers

Defect-aware Hybrid Prompt Optimization via Progressive Tuning for Zero-Shot Multi-type Anomaly Detection and Segmentation [12.030059666003972]
We introduce DAPO, a novel approach for Defect-aware Prompt Optimization based on progressive tuning for the zero-shot multi-type and binary anomaly detection and segmentation under distribution shifts.<n>Our approach aligns anomaly-relevant image features with their corresponding text semantics by learning hybrid defect-aware prompts with both fixed textual anchors and learnable token embeddings.
arXiv Detail & Related papers (2025-12-10T09:19:17Z)
SAGE: Spuriousness-Aware Guided Prompt Exploration for Mitigating Multimodal Bias [43.06550296269926]
Large vision-language models, such as CLIP, have shown strong zero-shot classification performance by aligning images and text in a shared embedding space.<n>However, CLIP models often develop multimodal spurious biases, which is the undesirable tendency to rely on spurious features.<n>We propose Spuriousness-Aware Guided Exploration (SAGE), a simple and effective method that mitigates spurious bias through guided prompt selection.
arXiv Detail & Related papers (2025-11-17T05:52:32Z)
Beyond Frequency: Scoring-Driven Debiasing for Object Detection via Blueprint-Prompted Image Synthesis [97.37770785712475]
We present a generation-based debiasing framework for object detection.<n>Our method significantly narrows the performance gap for underrepresented object groups.
arXiv Detail & Related papers (2025-10-21T02:19:12Z)
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models [55.30555646945055]
Text-to-Image (T2I) models are vulnerable to semantic leakage.<n>We introduce DeLeaker, a lightweight approach that mitigates leakage by directly intervening on the model's attention maps.<n>SLIM is the first dataset dedicated to semantic leakage.
arXiv Detail & Related papers (2025-10-16T17:39:21Z)
T2I-Eval-R1: Reinforcement Learning-Driven Reasoning for Interpretable Text-to-Image Evaluation [60.620408007636016]
We propose T2I-Eval-R1, a novel reinforcement learning framework that trains open-source MLLMs using only coarse-grained quality scores.<n>Our approach integrates Group Relative Policy Optimization into the instruction-tuning process, enabling models to generate both scalar scores and interpretable reasoning chains.
arXiv Detail & Related papers (2025-05-23T13:44:59Z)
Bayesian Prompt Flow Learning for Zero-Shot Anomaly Detection [17.590853105242864]
vision-language models (e.g. CLIP) have demonstrated remarkable performance in zero-shot anomaly detection (ZSAD)<n>Bayes-PFL is designed to learn both image-specific and image-agnostic distributions, which are jointly utilized to regularize the text prompt space and improve the model's generalization on unseen categories.<n>Experiments on 15 industrial and medical datasets demonstrate our method's superior performance.
arXiv Detail & Related papers (2025-03-13T06:05:35Z)
VersusDebias: Universal Zero-Shot Debiasing for Text-to-Image Models via SLM-Based Prompt Engineering and Generative Adversary [8.24274551090375]
We introduce VersusDebias, a novel and universal debiasing framework for biases in arbitrary Text-to-Image (T2I) models. The self-adaptive module generates specialized attribute arrays to post-process hallucinations and debias multiple attributes simultaneously. In both zero-shot and few-shot scenarios, VersusDebias outperforms existing methods, showcasing its exceptional utility.
arXiv Detail & Related papers (2024-07-28T16:24:07Z)
Take Care of Your Prompt Bias! Investigating and Mitigating Prompt Bias in Factual Knowledge Extraction [56.17020601803071]
Recent research shows that pre-trained language models (PLMs) suffer from "prompt bias" in factual knowledge extraction. This paper aims to improve the reliability of existing benchmarks by thoroughly investigating and mitigating prompt bias.
arXiv Detail & Related papers (2024-03-15T02:04:35Z)
Debiasing Multimodal Large Language Models via Penalization of Language Priors [38.97645845493758]
Multimodal Large Language Models (MLLMs) have become indispensable tools in computer vision and natural language processing.<n>Despite their advancements, our investigation reveals a noteworthy bias: the generated content is often driven more by the inherent priors of the underlying Large Language Models (LLMs) than by the input image.<n>We propose two simple, training-free strategies to rectify these biases and redirect the model's focus toward visual information.
arXiv Detail & Related papers (2024-03-08T12:35:07Z)
Fairness-guided Few-shot Prompting for Large Language Models [93.05624064699965]
In-context learning can suffer from high instability due to variations in training examples, example order, and prompt formats. We introduce a metric to evaluate the predictive bias of a fixed prompt against labels or a given attributes. We propose a novel search strategy based on the greedy search to identify the near-optimal prompt for improving the performance of in-context learning.
arXiv Detail & Related papers (2023-03-23T12:28:25Z)
Delving into Identify-Emphasize Paradigm for Combating Unknown Bias [52.76758938921129]
We propose an effective bias-conflicting scoring method (ECS) to boost the identification accuracy. We also propose gradient alignment (GA) to balance the contributions of the mined bias-aligned and bias-conflicting samples. Experiments are conducted on multiple datasets in various settings, demonstrating that the proposed solution can mitigate the impact of unknown biases.
arXiv Detail & Related papers (2023-02-22T14:50:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.