End-to-End Diffusion Latent Optimization Improves Classifier Guidance
- URL: http://arxiv.org/abs/2303.13703v2
- Date: Wed, 31 May 2023 19:40:57 GMT
- Title: End-to-End Diffusion Latent Optimization Improves Classifier Guidance
- Authors: Bram Wallace, Akash Gokul, Stefano Ermon, Nikhil Naik
- Abstract summary: Direct Optimization of Diffusion Latents (DOODL) is a novel guidance method.
It enables plug-and-play guidance by optimizing diffusion latents.
It outperforms one-step classifier guidance on computational and human evaluation metrics.
- Score: 81.27364542975235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Classifier guidance -- using the gradients of an image classifier to steer
the generations of a diffusion model -- has the potential to dramatically
expand the creative control over image generation and editing. However,
currently classifier guidance requires either training new noise-aware models
to obtain accurate gradients or using a one-step denoising approximation of the
final generation, which leads to misaligned gradients and sub-optimal control.
We highlight this approximation's shortcomings and propose a novel guidance
method: Direct Optimization of Diffusion Latents (DOODL), which enables
plug-and-play guidance by optimizing diffusion latents w.r.t. the gradients of
a pre-trained classifier on the true generated pixels, using an invertible
diffusion process to achieve memory-efficient backpropagation. Showcasing the
potential of more precise guidance, DOODL outperforms one-step classifier
guidance on computational and human evaluation metrics across different forms
of guidance: using CLIP guidance to improve generations of complex prompts from
DrawBench, using fine-grained visual classifiers to expand the vocabulary of
Stable Diffusion, enabling image-conditioned generation with a CLIP visual
encoder, and improving image aesthetics using an aesthetic scoring network.
Code at https://github.com/salesforce/DOODL.
Related papers
- Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis [0.7428236410246183]
We investigate optimized convolutional neural networks (CNNs) developed for automatic modulation classification (AMC) of wireless signals.
We propose optimized models with the combinations of these techniques to fuse the complementary optimization benefits.
The experimental results show that the proposed individual and combined optimization techniques are highly effective for developing models with significantly less complexity.
arXiv Detail & Related papers (2024-04-11T06:08:23Z) - Active Generation for Image Classification [50.18107721267218]
We propose to address the efficiency of image generation by focusing on the specific needs and characteristics of the model.
With a central tenet of active learning, our method, named ActGen, takes a training-aware approach to image generation.
arXiv Detail & Related papers (2024-03-11T08:45:31Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - Elucidating The Design Space of Classifier-Guided Diffusion Generation [17.704873767509557]
We show that it is possible to achieve significant performance improvements over existing guidance schemes by leveraging off-the-shelf classifiers in a training-free fashion.
Our proposed approach has great potential and can be readily scaled up to text-to-image generation tasks.
arXiv Detail & Related papers (2023-10-17T14:34:58Z) - Aligning Text-to-Image Diffusion Models with Reward Backpropagation [62.45086888512723]
We propose AlignProp, a method that aligns diffusion models to downstream reward functions using end-to-end backpropagation of the reward gradient.
We show AlignProp achieves higher rewards in fewer training steps than alternatives, while being conceptually simpler.
arXiv Detail & Related papers (2023-10-05T17:59:18Z) - Exploring Compositional Visual Generation with Latent Classifier
Guidance [19.48538300223431]
We train latent diffusion models and auxiliary latent classifiers to facilitate non-linear navigation of latent representation generation.
We show that such conditional generation achieved by latent classifier guidance provably maximizes a lower bound of the conditional log probability during training.
We show that this paradigm based on latent classifier guidance is agnostic to pre-trained generative models, and present competitive results for both image generation and sequential manipulation of real and synthetic images.
arXiv Detail & Related papers (2023-04-25T03:02:58Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Denoising Diffusion Autoencoders are Unified Self-supervised Learners [58.194184241363175]
This paper shows that the networks in diffusion models, namely denoising diffusion autoencoders (DDAE), are unified self-supervised learners.
DDAE has already learned strongly linear-separable representations within its intermediate layers without auxiliary encoders.
Our diffusion-based approach achieves 95.9% and 50.0% linear evaluation accuracies on CIFAR-10 and Tiny-ImageNet.
arXiv Detail & Related papers (2023-03-17T04:20:47Z) - Enhancing Diffusion-Based Image Synthesis with Robust Classifier
Guidance [17.929524924008962]
In order to obtain class-conditional generation, it was suggested to guide the diffusion process by gradients from a time-dependent classifier.
While the idea is theoretically sound, deep learning-based classifiers are infamously susceptible to gradient-based adversarial attacks.
We utilize this observation by defining and training a time-dependent adversarially robust classifier and use it as guidance for a generative diffusion model.
arXiv Detail & Related papers (2022-08-18T06:51:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.