Related papers: Severity Controlled Text-to-Image Generative Model Bias Manipulation

Severity Controlled Text-to-Image Generative Model Bias Manipulation

URL: http://arxiv.org/abs/2404.02530v1
Date: Wed, 3 Apr 2024 07:33:30 GMT
Title: Severity Controlled Text-to-Image Generative Model Bias Manipulation
Authors: Jordan Vice, Naveed Akhtar, Richard Hartley, Ajmal Mian,
Abstract summary: Text-to-image (T2I) generative models are gaining wide popularity, especially in public domains. We first expose the new possibility of a dynamic and computationally efficient exploitation of model bias by targeting embedded language models. We present interesting qualitative and quantitative results to expose potential manipulation possibilities for T2I models.
Score: 49.60774626839712
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Text-to-image (T2I) generative models are gaining wide popularity, especially in public domains. However, their intrinsic bias and potential malicious manipulations remain under-explored. Charting the susceptibility of T2I models to such manipulation, we first expose the new possibility of a dynamic and computationally efficient exploitation of model bias by targeting the embedded language models. By leveraging mathematical foundations of vector algebra, our technique enables a scalable and convenient control over the severity of output manipulation through model bias. As a by-product, this control also allows a form of precise prompt engineering to generate images which are generally implausible with regular text prompts. We also demonstrate a constructive application of our manipulation for balancing the frequency of generated classes - as in model debiasing. Our technique does not require training and is also framed as a backdoor attack with severity control using semantically-null text triggers in the prompts. With extensive analysis, we present interesting qualitative and quantitative results to expose potential manipulation possibilities for T2I models. Key-words: Text-to-Image Models, Generative Models, Backdoor Attacks, Prompt Engineering, Bias

Related papers

AutoDebias: Automated Framework for Debiasing Text-to-Image Models [6.581606189725493]
Text-to-Image (T2I) models generate high-quality images from text prompts but often exhibit unintended social biases.<n>We propose AutoDebias, a framework that automatically identifies and mitigates harmful biases in T2I models without prior knowledge of specific bias types.<n>We evaluate the framework on a benchmark covering over 25 bias scenarios, including challenging cases where multiple biases occur simultaneously.
arXiv Detail & Related papers (2025-08-01T09:05:45Z)
Implicit Bias Injection Attacks against Text-to-Image Diffusion Models [17.131167390657243]
biased T2I models can generate content with specific tendencies, potentially influencing people's perceptions. This paper introduces a novel form of implicit bias that lacks explicit visual features but can manifest in diverse ways. We propose an implicit bias injection attack framework (IBI-Attacks) against T2I diffusion models.
arXiv Detail & Related papers (2025-04-02T15:24:12Z)
Steering Without Side Effects: Improving Post-Deployment Control of Language Models [61.99293520621248]
Language models (LMs) have been shown to behave unexpectedly post-deployment. We present KL-then-steer (KTS), a technique that decreases the side effects of steering while retaining its benefits. Our best method prevents 44% of jailbreak attacks compared to the original Llama-2-chat-7B model.
arXiv Detail & Related papers (2024-06-21T01:37:39Z)
Utilizing Adversarial Examples for Bias Mitigation and Accuracy Enhancement [3.0820287240219795]
We propose a novel approach to mitigate biases in computer vision models by utilizing counterfactual generation and fine-tuning. Our approach leverages a curriculum learning framework combined with a fine-grained adversarial loss to fine-tune the model using adversarial examples. We validate our approach through both qualitative and quantitative assessments, demonstrating improved bias mitigation and accuracy compared to existing methods.
arXiv Detail & Related papers (2024-04-18T00:41:32Z)
Self-Debiasing Large Language Models: Zero-Shot Recognition and Reduction of Stereotypes [73.12947922129261]
We leverage the zero-shot capabilities of large language models to reduce stereotyping. We show that self-debiasing can significantly reduce the degree of stereotyping across nine different social groups. We hope this work opens inquiry into other zero-shot techniques for bias mitigation.
arXiv Detail & Related papers (2024-02-03T01:40:11Z)
BAGM: A Backdoor Attack for Manipulating Text-to-Image Generative Models [54.19289900203071]
The rise in popularity of text-to-image generative artificial intelligence has attracted widespread public interest. We demonstrate that this technology can be attacked to generate content that subtly manipulates its users. We propose a Backdoor Attack on text-to-image Generative Models (BAGM) Our attack is the first to target three popular text-to-image generative models across three stages of the generative process.
arXiv Detail & Related papers (2023-07-31T08:34:24Z)
Fair Diffusion: Instructing Text-to-Image Generation Models on Fairness [15.059419033330126]
We present a novel strategy, called Fair Diffusion, to attenuate biases after the deployment of generative text-to-image models. Specifically, we demonstrate shifting a bias, based on human instructions, in any direction yielding arbitrarily new proportions for, e.g., identity groups. This introduced control enables instructing generative image models on fairness, with no data filtering and additional training required.
arXiv Detail & Related papers (2023-02-07T18:25:28Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
Better sampling in explanation methods can prevent dieselgate-like deception [0.0]
Interpretability of prediction models is necessary to determine their biases and causes of errors. Popular techniques, such as IME, LIME, and SHAP, use perturbation of instance features to explain individual predictions. We show that the improved sampling increases the robustness of the LIME and SHAP, while previously untested method IME is already the most robust of all.
arXiv Detail & Related papers (2021-01-26T13:41:37Z)
Learning from others' mistakes: Avoiding dataset biases without modeling them [111.17078939377313]
State-of-the-art natural language processing (NLP) models often learn to model dataset biases and surface form correlations instead of features that target the intended task. Previous work has demonstrated effective methods to circumvent these issues when knowledge of the bias is available. We show a method for training models that learn to ignore these problematic correlations.
arXiv Detail & Related papers (2020-12-02T16:10:54Z)
Backdoor Attacks against Transfer Learning with Pre-trained Deep Learning Models [23.48763375455514]
Transfer learning provides an effective solution for feasibly and fast customize accurate textitStudent models. Many pre-trained Teacher models are publicly available and maintained by public platforms, increasing their vulnerability to backdoor attacks. We demonstrate a backdoor threat to transfer learning tasks on both image and time-series data leveraging the knowledge of publicly accessible Teacher models.
arXiv Detail & Related papers (2020-01-10T01:31:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.