Related papers: Diminishing Stereotype Bias in Image Generation Model using Reinforcemenlent Learning Feedback

Diminishing Stereotype Bias in Image Generation Model using Reinforcemenlent Learning Feedback

URL: http://arxiv.org/abs/2407.09551v1
Date: Thu, 27 Jun 2024 17:18:58 GMT
Title: Diminishing Stereotype Bias in Image Generation Model using Reinforcemenlent Learning Feedback
Authors: Xin Chen, Virgile Foussereau,
Abstract summary: This study addresses gender bias in image generation models using Reinforcement Learning from Artificial Intelligence Feedback (RLAIF) By employing a pretrained stable diffusion model and a highly accurate gender classification Transformer, the research introduces two reward functions: Rshift for shifting gender imbalances, and Rbalance for achieving and maintaining gender balance. Experiments demonstrate the effectiveness of this approach in mitigating bias without compromising image quality or requiring additional data or prompt modifications.
Score: 3.406797377411835
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: This study addresses gender bias in image generation models using Reinforcement Learning from Artificial Intelligence Feedback (RLAIF) with a novel Denoising Diffusion Policy Optimization (DDPO) pipeline. By employing a pretrained stable diffusion model and a highly accurate gender classification Transformer, the research introduces two reward functions: Rshift for shifting gender imbalances, and Rbalance for achieving and maintaining gender balance. Experiments demonstrate the effectiveness of this approach in mitigating bias without compromising image quality or requiring additional data or prompt modifications. While focusing on gender bias, this work establishes a foundation for addressing various forms of bias in AI systems, emphasizing the need for responsible AI development. Future research directions include extending the methodology to other bias types, enhancing the RLAIF pipeline's robustness, and exploring multi-prompt fine-tuning to further advance fairness and inclusivity in AI.

Related papers

Model-Agnostic Gender Bias Control for Text-to-Image Generation via Sparse Autoencoder [14.164976259534143]
Text-to-image (T2I) diffusion models often exhibit gender bias, particularly by generating stereotypical associations between professions and gendered subjects.<n>This paper presents SAE Debias, a model-agnostic framework for mitigating such bias in T2I generation.<n>To the best of our knowledge, this is the first work to apply sparse autoencoders for identifying and intervening in gender bias within T2I models.
arXiv Detail & Related papers (2025-07-28T16:36:13Z)
BOOST: Out-of-Distribution-Informed Adaptive Sampling for Bias Mitigation in Stylistic Convolutional Neural Networks [8.960561031294727]
Bias in AI presents a significant challenge to painting classification, and is getting more serious as these systems become increasingly integrated into tasks like art curation and restoration.<n>We propose a novel OOD-informed model bias adaptive sampling method called BOOST.<n>We evaluate our proposed approach to the KaoKore and PACS datasets, focusing on the model's ability to reduce class-wise bias.
arXiv Detail & Related papers (2025-07-08T22:18:36Z)
Deeper Diffusion Models Amplify Bias [46.2410852276839]
Diffusion models may amplify inherent bias in the training data.<n>They may also compromise the presumed privacy of the training samples.<n>We introduce a training-free method to improve output quality in text-to-image and image-to-image generation.
arXiv Detail & Related papers (2025-05-23T07:08:09Z)
A Meaningful Perturbation Metric for Evaluating Explainability Methods [55.09730499143998]
We introduce a novel approach, which harnesses image generation models to perform targeted perturbation. Specifically, we focus on inpainting only the high-relevance pixels of an input image to modify the model's predictions while preserving image fidelity. This is in contrast to existing approaches, which often produce out-of-distribution modifications, leading to unreliable results.
arXiv Detail & Related papers (2025-04-09T11:46:41Z)
Uncovering Bias in Foundation Models: Impact, Testing, Harm, and Mitigation [26.713973033726464]
Bias in Foundation Models (FMs) poses significant challenges for fairness and equity across fields such as healthcare, education, and finance. These biases, rooted in the overrepresentation of stereotypes and societal inequalities in training data, exacerbate real-world discrimination, reinforce harmful stereotypes, and erode trust in AI systems. We introduce Trident Probe Testing (TriProTesting), a systematic testing method that detects explicit and implicit biases using semantically designed probes.
arXiv Detail & Related papers (2025-01-14T19:06:37Z)
Gender Bias Evaluation in Text-to-image Generation: A Survey [25.702257177921048]
We review recent work on gender bias evaluation in text-to-image generation. We focus on the evaluation of recent popular models such as Stable Diffusion and DALL-E 2.
arXiv Detail & Related papers (2024-08-21T06:01:23Z)
Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation [87.50120181861362]
VisionPrefer is a high-quality and fine-grained preference dataset that captures multiple preference aspects. We train a reward model VP-Score over VisionPrefer to guide the training of text-to-image generative models and the preference prediction accuracy of VP-Score is comparable to human annotators.
arXiv Detail & Related papers (2024-04-23T14:53:15Z)
Utilizing Adversarial Examples for Bias Mitigation and Accuracy Enhancement [3.0820287240219795]
We propose a novel approach to mitigate biases in computer vision models by utilizing counterfactual generation and fine-tuning. Our approach leverages a curriculum learning framework combined with a fine-grained adversarial loss to fine-tune the model using adversarial examples. We validate our approach through both qualitative and quantitative assessments, demonstrating improved bias mitigation and accuracy compared to existing methods.
arXiv Detail & Related papers (2024-04-18T00:41:32Z)
PiRD: Physics-informed Residual Diffusion for Flow Field Reconstruction [5.06136344261226]
CNN-based methods for data fidelity enhancement rely on low-fidelity data patterns and distributions during the training phase. Our proposed model - Physics-informed Residual Diffusion - demonstrates the capability to elevate the quality of data from both standard low-fidelity inputs. Experimental results have shown that our approach can effectively reconstruct high-quality outcomes for two-dimensional turbulent flows without requiring retraining.
arXiv Detail & Related papers (2024-04-12T11:45:51Z)
Diffusion Model Based Visual Compensation Guidance and Visual Difference Analysis for No-Reference Image Quality Assessment [82.13830107682232]
We propose a novel class of state-of-the-art (SOTA) generative model, which exhibits the capability to model intricate relationships. We devise a new diffusion restoration network that leverages the produced enhanced image and noise-containing images. Two visual evaluation branches are designed to comprehensively analyze the obtained high-level feature information.
arXiv Detail & Related papers (2024-02-22T09:39:46Z)
Detecting and Mitigating Algorithmic Bias in Binary Classification using Causal Modeling [0.0]
We show that gender bias in the prediction model is statistically significant at the 0.05 level. We demonstrate the effectiveness of the causal model in mitigating gender bias by cross-validation. Our novel approach is intuitive, easy-to-use, and can be implemented using existing statistical software tools such as "lavaan" in R.
arXiv Detail & Related papers (2023-10-19T02:21:04Z)
Training Diffusion Models with Reinforcement Learning [82.29328477109826]
Diffusion models are trained with an approximation to the log-likelihood objective. In this paper, we investigate reinforcement learning methods for directly optimizing diffusion models for downstream objectives. We describe how posing denoising as a multi-step decision-making problem enables a class of policy gradient algorithms.
arXiv Detail & Related papers (2023-05-22T17:57:41Z)
Debiasing Vision-Language Models via Biased Prompts [79.04467131711775]
We propose a general approach for debiasing vision-language foundation models by projecting out biased directions in the text embedding. We show that debiasing only the text embedding with a calibrated projection matrix suffices to yield robust classifiers and fair generative models.
arXiv Detail & Related papers (2023-01-31T20:09:33Z)
Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search [8.730027941735804]
We study a unique gender bias in image search. The search images are often gender-imbalanced for gender-neutral natural language queries. We introduce two novel debiasing approaches.
arXiv Detail & Related papers (2021-09-12T04:47:33Z)
Uncertainty-Aware Blind Image Quality Assessment in the Laboratory and Wild [98.48284827503409]
We develop a textitunified BIQA model and an approach of training it for both synthetic and realistic distortions. We employ the fidelity loss to optimize a deep neural network for BIQA over a large number of such image pairs. Experiments on six IQA databases show the promise of the learned method in blindly assessing image quality in the laboratory and wild.
arXiv Detail & Related papers (2020-05-28T13:35:23Z)
When Relation Networks meet GANs: Relation GANs with Triplet Loss [110.7572918636599]
Training stability is still a lingering concern of generative adversarial networks (GANs) In this paper, we explore a relation network architecture for the discriminator and design a triplet loss which performs better generalization and stability. Experiments on benchmark datasets show that the proposed relation discriminator and new loss can provide significant improvement on variable vision tasks.
arXiv Detail & Related papers (2020-02-24T11:35:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.