ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model
- URL: http://arxiv.org/abs/2311.14542v1
- Date: Fri, 24 Nov 2023 15:20:01 GMT
- Title: ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model
- Authors: Eslam Mohamed Bakr, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord,
Patrick Perez, Mohamed Elhoseiny
- Abstract summary: ToddlerDiffusion is an interpretable 2D diffusion image-synthesis framework inspired by the human generation system.
Our approach decomposes the generation process into simpler, interpretable stages; generating contours, a palette, and a detailed colored image.
- Score: 68.16230122583634
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion-based generative models excel in perceptually impressive synthesis
but face challenges in interpretability. This paper introduces
ToddlerDiffusion, an interpretable 2D diffusion image-synthesis framework
inspired by the human generation system. Unlike traditional diffusion models
with opaque denoising steps, our approach decomposes the generation process
into simpler, interpretable stages; generating contours, a palette, and a
detailed colored image. This not only enhances overall performance but also
enables robust editing and interaction capabilities. Each stage is meticulously
formulated for efficiency and accuracy, surpassing Stable-Diffusion (LDM).
Extensive experiments on datasets like LSUN-Churches and COCO validate our
approach, consistently outperforming existing methods. ToddlerDiffusion
achieves notable efficiency, matching LDM performance on LSUN-Churches while
operating three times faster with a 3.76 times smaller architecture. Our source
code is provided in the supplementary material and will be publicly accessible.
Related papers
- Distilling Diffusion Models into Conditional GANs [90.76040478677609]
We distill a complex multistep diffusion model into a single-step conditional GAN student model.
For efficient regression loss, we propose E-LatentLPIPS, a perceptual loss operating directly in diffusion model's latent space.
We demonstrate that our one-step generator outperforms cutting-edge one-step diffusion distillation models.
arXiv Detail & Related papers (2024-05-09T17:59:40Z) - Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation [24.236841051249243]
Distillation methods aim to shift the model from many-shot to single-step inference.
We introduce Latent Adversarial Diffusion Distillation (LADD), a novel distillation approach overcoming the limitations of ADD.
In contrast to pixel-based ADD, LADD utilizes generative features from pretrained latent diffusion models.
arXiv Detail & Related papers (2024-03-18T17:51:43Z) - Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation [59.184980778643464]
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI)
In this paper, we introduce an innovative technique called self-play fine-tuning for diffusion models (SPIN-Diffusion)
Our approach offers an alternative to conventional supervised fine-tuning and RL strategies, significantly improving both model performance and alignment.
arXiv Detail & Related papers (2024-02-15T18:59:18Z) - Towards Detailed Text-to-Motion Synthesis via Basic-to-Advanced
Hierarchical Diffusion Model [60.27825196999742]
We propose a novel Basic-to-Advanced Hierarchical Diffusion Model, named B2A-HDM, to collaboratively exploit low-dimensional and high-dimensional diffusion models for detailed motion synthesis.
Specifically, the basic diffusion model in low-dimensional latent space provides the intermediate denoising result that is consistent with the textual description.
The advanced diffusion model in high-dimensional latent space focuses on the following detail-enhancing denoising process.
arXiv Detail & Related papers (2023-12-18T06:30:39Z) - Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference [60.32804641276217]
We propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs.
A high-quality 768 x 768 24-step LCM takes only 32 A100 GPU hours for training.
We also introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets.
arXiv Detail & Related papers (2023-10-06T17:11:58Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Are Diffusion Models Vision-And-Language Reasoners? [30.579483430697803]
We transform diffusion-based models for any image-text matching (ITM) task using a novel method called DiffusionITM.
We introduce the Generative-Discriminative Evaluation Benchmark (GDBench) benchmark with 7 complex vision-and-language tasks, bias evaluation and detailed analysis.
We find that Stable Diffusion + DiffusionITM is competitive on many tasks and outperforms CLIP on compositional tasks like CLEVR and Winoground.
arXiv Detail & Related papers (2023-05-25T18:02:22Z) - MagicFusion: Boosting Text-to-Image Generation Performance by Fusing
Diffusion Models [20.62953292593076]
We propose a simple yet effective method called Saliency-aware Noise Blending (SNB) that can empower the fused text-guided diffusion models to achieve more controllable generation.
SNB is training-free and can be completed within a DDIM sampling process. Additionally, it can automatically align the semantics of two noise spaces without requiring additional annotations such as masks.
arXiv Detail & Related papers (2023-03-23T09:30:39Z) - SALAD: Part-Level Latent Diffusion for 3D Shape Generation and Manipulation [11.828311976126301]
We present a cascaded diffusion model based on a part-level implicit 3D representation.
Our model achieves state-of-the-art generation quality and also enables part-level shape editing and manipulation without any additional training in conditional setup.
arXiv Detail & Related papers (2023-03-21T23:43:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.