ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model
- URL: http://arxiv.org/abs/2311.14542v1
- Date: Fri, 24 Nov 2023 15:20:01 GMT
- Title: ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model
- Authors: Eslam Mohamed Bakr, Liangbing Zhao, Vincent Tao Hu, Matthieu Cord,
Patrick Perez, Mohamed Elhoseiny
- Abstract summary: ToddlerDiffusion is an interpretable 2D diffusion image-synthesis framework inspired by the human generation system.
Our approach decomposes the generation process into simpler, interpretable stages; generating contours, a palette, and a detailed colored image.
- Score: 68.16230122583634
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion-based generative models excel in perceptually impressive synthesis
but face challenges in interpretability. This paper introduces
ToddlerDiffusion, an interpretable 2D diffusion image-synthesis framework
inspired by the human generation system. Unlike traditional diffusion models
with opaque denoising steps, our approach decomposes the generation process
into simpler, interpretable stages; generating contours, a palette, and a
detailed colored image. This not only enhances overall performance but also
enables robust editing and interaction capabilities. Each stage is meticulously
formulated for efficiency and accuracy, surpassing Stable-Diffusion (LDM).
Extensive experiments on datasets like LSUN-Churches and COCO validate our
approach, consistently outperforming existing methods. ToddlerDiffusion
achieves notable efficiency, matching LDM performance on LSUN-Churches while
operating three times faster with a 3.76 times smaller architecture. Our source
code is provided in the supplementary material and will be publicly accessible.
Related papers
- Improving Progressive Generation with Decomposable Flow Matching [50.63174319509629]
Decomposable Flow Matching (DFM) is a simple and effective framework for the progressive generation of visual media.<n>On Imagenet-1k 512px, DFM achieves 35.2% improvements in FDD scores over the base architecture and 26.4% over the best-performing baseline.
arXiv Detail & Related papers (2025-06-24T17:58:02Z) - Consistency Diffusion Bridge Models [25.213664260896103]
Diffusion bridge models (DDBMs) build processes between fixed data endpoints based on a reference diffusion process.
DDBMs' sampling process typically requires hundreds of network evaluations to achieve decent performance.
We propose two paradigms: consistency bridge distillation and consistency bridge training, which is flexible to apply on DDBMs with broad design choices.
arXiv Detail & Related papers (2024-10-30T02:04:23Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference [60.32804641276217]
We propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs.
A high-quality 768 x 768 24-step LCM takes only 32 A100 GPU hours for training.
We also introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets.
arXiv Detail & Related papers (2023-10-06T17:11:58Z) - Mutual Information-driven Triple Interaction Network for Efficient Image
Dehazing [54.168567276280505]
We propose a novel Mutual Information-driven Triple interaction Network (MITNet) for image dehazing.
The first stage, named amplitude-guided haze removal, aims to recover the amplitude spectrum of the hazy images for haze removal.
The second stage, named phase-guided structure refined, devotes to learning the transformation and refinement of the phase spectrum.
arXiv Detail & Related papers (2023-08-14T08:23:58Z) - DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing [94.24479528298252]
DragGAN is an interactive point-based image editing framework that achieves impressive editing results with pixel-level precision.
By harnessing large-scale pretrained diffusion models, we greatly enhance the applicability of interactive point-based editing on both real and diffusion-generated images.
We present a challenging benchmark dataset called DragBench to evaluate the performance of interactive point-based image editing methods.
arXiv Detail & Related papers (2023-06-26T06:04:09Z) - Hierarchical Integration Diffusion Model for Realistic Image Deblurring [71.76410266003917]
Diffusion models (DMs) have been introduced in image deblurring and exhibited promising performance.
We propose the Hierarchical Integration Diffusion Model (HI-Diff), for realistic image deblurring.
Experiments on synthetic and real-world blur datasets demonstrate that our HI-Diff outperforms state-of-the-art methods.
arXiv Detail & Related papers (2023-05-22T12:18:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.