Debiasing Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2402.14577v1
- Date: Thu, 22 Feb 2024 14:33:23 GMT
- Title: Debiasing Text-to-Image Diffusion Models
- Authors: Ruifei He, Chuhui Xue, Haoru Tan, Wenqing Zhang, Yingchen Yu, Song
Bai, and Xiaojuan Qi
- Abstract summary: Learning-based Text-to-Image (TTI) models have revolutionized the way visual content is generated in various domains.
Recent research has shown that nonnegligible social bias exists in current state-of-the-art TTI systems.
- Score: 84.46750441518697
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning-based Text-to-Image (TTI) models like Stable Diffusion have
revolutionized the way visual content is generated in various domains. However,
recent research has shown that nonnegligible social bias exists in current
state-of-the-art TTI systems, which raises important concerns. In this work, we
target resolving the social bias in TTI diffusion models. We begin by
formalizing the problem setting and use the text descriptions of bias groups to
establish an unsafe direction for guiding the diffusion process. Next, we
simplify the problem into a weight optimization problem and attempt a
Reinforcement solver, Policy Gradient, which shows sub-optimal performance with
slow convergence. Further, to overcome limitations, we propose an iterative
distribution alignment (IDA) method. Despite its simplicity, we show that IDA
shows efficiency and fast convergence in resolving the social bias in TTI
diffusion models. Our code will be released.
Related papers
- LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? [10.72249123249003]
We revisit diffusion models, highlighting their capacity for holistic context modeling and parallel decoding.
We introduce a novel architecture, LaDiC, which utilizes a split BERT to create a dedicated latent space for captions.
LaDiC achieves state-of-the-art performance for diffusion-based methods on the MS dataset with 38.2 BLEU@4 and 126.2 CIDEr.
arXiv Detail & Related papers (2024-04-16T17:47:16Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Prompt-tuning latent diffusion models for inverse problems [72.13952857287794]
We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors.
Our method, called P2L, outperforms both image- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting.
arXiv Detail & Related papers (2023-10-02T11:31:48Z) - Phasic Content Fusing Diffusion Model with Directional Distribution
Consistency for Few-Shot Model Adaption [73.98706049140098]
We propose a novel phasic content fusing few-shot diffusion model with directional distribution consistency loss.
Specifically, we design a phasic training strategy with phasic content fusion to help our model learn content and style information when t is large.
Finally, we propose a cross-domain structure guidance strategy that enhances structure consistency during domain adaptation.
arXiv Detail & Related papers (2023-09-07T14:14:11Z) - Eliminating Lipschitz Singularities in Diffusion Models [51.806899946775076]
We show that diffusion models frequently exhibit the infinite Lipschitz near the zero point of timesteps.
This poses a threat to the stability and accuracy of the diffusion process, which relies on integral operations.
We propose a novel approach, dubbed E-TSDM, which eliminates the Lipschitz of the diffusion model near zero.
arXiv Detail & Related papers (2023-06-20T03:05:28Z) - Controlling Text-to-Image Diffusion by Orthogonal Finetuning [74.21549380288631]
We introduce a principled finetuning method -- Orthogonal Finetuning (OFT) for adapting text-to-image diffusion models to downstream tasks.
Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere.
We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
arXiv Detail & Related papers (2023-06-12T17:59:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.