Improving Sample Quality of Diffusion Models Using Self-Attention
Guidance
- URL: http://arxiv.org/abs/2210.00939v6
- Date: Thu, 24 Aug 2023 16:26:54 GMT
- Title: Improving Sample Quality of Diffusion Models Using Self-Attention
Guidance
- Authors: Susung Hong, Gyuseong Lee, Wooseok Jang, Seungryong Kim
- Abstract summary: Self-Attention Guidance (SAG) improves the performance of various diffusion models.
SAG adversarially blurs only the regions that diffusion models attend to at each and guides them accordingly.
Our results show that our SAG improves the performance of various diffusion models, including ADM, IDDPM, Stable Diffusion, and DiT.
- Score: 36.42984435784378
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Denoising diffusion models (DDMs) have attracted attention for their
exceptional generation quality and diversity. This success is largely
attributed to the use of class- or text-conditional diffusion guidance methods,
such as classifier and classifier-free guidance. In this paper, we present a
more comprehensive perspective that goes beyond the traditional guidance
methods. From this generalized perspective, we introduce novel condition- and
training-free strategies to enhance the quality of generated images. As a
simple solution, blur guidance improves the suitability of intermediate samples
for their fine-scale information and structures, enabling diffusion models to
generate higher quality samples with a moderate guidance scale. Improving upon
this, Self-Attention Guidance (SAG) uses the intermediate self-attention maps
of diffusion models to enhance their stability and efficacy. Specifically, SAG
adversarially blurs only the regions that diffusion models attend to at each
iteration and guides them accordingly. Our experimental results show that our
SAG improves the performance of various diffusion models, including ADM, IDDPM,
Stable Diffusion, and DiT. Moreover, combining SAG with conventional guidance
methods leads to further improvement.
Related papers
- Efficient Diversity-Preserving Diffusion Alignment via Gradient-Informed GFlowNets [65.42834731617226]
We propose a reinforcement learning method for diffusion model finetuning, dubbed Nabla-GFlowNet.
We show that our proposed method achieves fast yet diversity- and prior-preserving finetuning of Stable Diffusion, a large-scale text-conditioned image diffusion model.
arXiv Detail & Related papers (2024-12-10T18:59:58Z) - SNOOPI: Supercharged One-step Diffusion Distillation with Proper Guidance [12.973835034100428]
This paper presents SNOOPI, a novel framework designed to enhance the guidance in one-step diffusion models during both training and inference.
By varying the guidance scale of both teacher models, we broaden their output distributions, resulting in a more robust VSD loss that enables SB to perform effectively across diverse backbones while maintaining competitive performance.
Second, we propose a training-free method called Negative-Away Steer Attention (NASA), which integrates negative prompts into one-step diffusion models via cross-attention to suppress undesired elements in generated images.
arXiv Detail & Related papers (2024-12-03T18:56:32Z) - Unlocking the Capabilities of Masked Generative Models for Image Synthesis via Self-Guidance [25.41734642338575]
Masked generative models (MGMs) have shown impressive generative ability while providing an order of magnitude efficient sampling steps.
We propose a self-guidance sampling method, which leads to better generation quality.
arXiv Detail & Related papers (2024-10-17T01:48:05Z) - Few-Shot Image Generation by Conditional Relaxing Diffusion Inversion [37.18537753482751]
Conditional Diffusion Relaxing Inversion (CRDI) is designed to enhance distribution diversity in synthetic image generation.
CRDI does not rely on fine-tuning based on only a few samples.
It focuses on reconstructing each target image instance and expanding diversity through few-shot learning.
arXiv Detail & Related papers (2024-07-09T21:58:26Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining.
We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z) - DifAugGAN: A Practical Diffusion-style Data Augmentation for GAN-based
Single Image Super-resolution [88.13972071356422]
We propose a diffusion-style data augmentation scheme for GAN-based image super-resolution (SR) methods, known as DifAugGAN.
It involves adapting the diffusion process in generative diffusion models for improving the calibration of the discriminator during training.
Our DifAugGAN can be a Plug-and-Play strategy for current GAN-based SISR methods to improve the calibration of the discriminator and thus improve SR performance.
arXiv Detail & Related papers (2023-11-30T12:37:53Z) - Manifold Preserving Guided Diffusion [121.97907811212123]
Conditional image generation still faces challenges of cost, generalizability, and the need for task-specific training.
We propose Manifold Preserving Guided Diffusion (MPGD), a training-free conditional generation framework.
arXiv Detail & Related papers (2023-11-28T02:08:06Z) - Diff-Instruct: A Universal Approach for Transferring Knowledge From
Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models.
We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models.
Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z) - Enhanced Controllability of Diffusion Models via Feature Disentanglement and Realism-Enhanced Sampling Methods [27.014858633903867]
We present a training framework for feature disentanglement of Diffusion Models (FDiff)
We propose two sampling methods that can boost the realism of our Diffusion Models and also enhance the controllability.
arXiv Detail & Related papers (2023-02-28T07:43:00Z) - A Survey on Generative Diffusion Model [75.93774014861978]
Diffusion models are an emerging class of deep generative models.
They have certain limitations, including a time-consuming iterative generation process and confinement to high-dimensional Euclidean space.
This survey presents a plethora of advanced techniques aimed at enhancing diffusion models.
arXiv Detail & Related papers (2022-09-06T16:56:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.