D2C: Diffusion-Denoising Models for Few-shot Conditional Generation
- URL: http://arxiv.org/abs/2106.06819v1
- Date: Sat, 12 Jun 2021 16:32:30 GMT
- Title: D2C: Diffusion-Denoising Models for Few-shot Conditional Generation
- Authors: Abhishek Sinha, Jiaming Song, Chenlin Meng, Stefano Ermon
- Abstract summary: This paper describes Diffusion-Decoding models with Contrastive representations (D2C)
D2C uses a learned diffusion-based prior over latent representations to improve generation and contrastive self-supervised learning to improve representation quality.
On conditional image manipulation, D2C generations are two orders of magnitude faster to produce over StyleGAN2 ones and are preferred by 50% - 60% of the human evaluators in a double-blind study.
- Score: 109.68228014811443
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Conditional generative models of high-dimensional images have many
applications, but supervision signals from conditions to images can be
expensive to acquire. This paper describes Diffusion-Decoding models with
Contrastive representations (D2C), a paradigm for training unconditional
variational autoencoders (VAEs) for few-shot conditional image generation. D2C
uses a learned diffusion-based prior over the latent representations to improve
generation and contrastive self-supervised learning to improve representation
quality. D2C can adapt to novel generation tasks conditioned on labels or
manipulation constraints, by learning from as few as 100 labeled examples. On
conditional generation from new labels, D2C achieves superior performance over
state-of-the-art VAEs and diffusion models. On conditional image manipulation,
D2C generations are two orders of magnitude faster to produce over StyleGAN2
ones and are preferred by 50% - 60% of the human evaluators in a double-blind
study.
Related papers
- Reinforcement Learning from Diffusion Feedback: Q* for Image Search [2.5835347022640254]
We present two models for image generation using model-agnostic learning.
RLDF is a singular approach for visual imitation through prior-preserving reward function guidance.
It generates high-quality images over varied domains showcasing class-consistency and strong visual diversity.
arXiv Detail & Related papers (2023-11-27T09:20:12Z) - Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional
Image Synthesis [62.07413805483241]
Steered Diffusion is a framework for zero-shot conditional image generation using a diffusion model trained for unconditional generation.
We present experiments using steered diffusion on several tasks including inpainting, colorization, text-guided semantic editing, and image super-resolution.
arXiv Detail & Related papers (2023-09-30T02:03:22Z) - DiffDis: Empowering Generative Diffusion Model with Cross-Modal
Discrimination Capability [75.9781362556431]
We propose DiffDis to unify the cross-modal generative and discriminative pretraining into one single framework under the diffusion process.
We show that DiffDis outperforms single-task models on both the image generation and the image-text discriminative tasks.
arXiv Detail & Related papers (2023-08-18T05:03:48Z) - Conditional Generation from Unconditional Diffusion Models using
Denoiser Representations [94.04631421741986]
We propose adapting pre-trained unconditional diffusion models to new conditions using the learned internal representations of the denoiser network.
We show that augmenting the Tiny ImageNet training set with synthetic images generated by our approach improves the classification accuracy of ResNet baselines by up to 8%.
arXiv Detail & Related papers (2023-06-02T20:09:57Z) - 3D-aware Image Generation using 2D Diffusion Models [23.150456832947427]
We formulate the 3D-aware image generation task as multiview 2D image set generation, and further to a sequential unconditional-conditional multiview image generation process.
We utilize 2D diffusion models to boost the generative modeling power of the method.
We train our method on a large-scale dataset, i.e., ImageNet, which is not addressed by previous methods.
arXiv Detail & Related papers (2023-03-31T09:03:18Z) - Visual Chain-of-Thought Diffusion Models [15.547439887203613]
We propose to close the gap between conditional and unconditional models using a two-stage sampling procedure.
Doing so lets us leverage the power of conditional diffusion models on the unconditional generation task, which we show improves FID by 25-50% compared to standard unconditional generation.
arXiv Detail & Related papers (2023-03-28T17:53:06Z) - DR2: Diffusion-based Robust Degradation Remover for Blind Face
Restoration [66.01846902242355]
Blind face restoration usually synthesizes degraded low-quality data with a pre-defined degradation model for training.
It is expensive and infeasible to include every type of degradation to cover real-world cases in the training data.
We propose Robust Degradation Remover (DR2) to first transform the degraded image to a coarse but degradation-invariant prediction, then employ an enhancement module to restore the coarse prediction to a high-quality image.
arXiv Detail & Related papers (2023-03-13T06:05:18Z) - ADIR: Adaptive Diffusion for Image Reconstruction [46.838084286784195]
We propose a conditional sampling scheme that exploits the prior learned by diffusion models.
We then combine it with a novel approach for adapting pretrained diffusion denoising networks to their input.
We show that our proposed adaptive diffusion for image reconstruction' approach achieves a significant improvement in the super-resolution, deblurring, and text-based editing tasks.
arXiv Detail & Related papers (2022-12-06T18:39:58Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.