Clockwork Diffusion: Efficient Generation With Model-Step Distillation
- URL: http://arxiv.org/abs/2312.08128v2
- Date: Tue, 20 Feb 2024 14:50:23 GMT
- Title: Clockwork Diffusion: Efficient Generation With Model-Step Distillation
- Authors: Amirhossein Habibian, Amir Ghodrati, Noor Fathima, Guillaume Sautiere,
Risheek Garrepalli, Fatih Porikli, Jens Petersen
- Abstract summary: Clockwork Diffusion is a method that periodically reuses computation from preceding denoising steps to approximate low-res feature maps at one or more subsequent steps.
For both text-to-image generation and image editing, we demonstrate that Clockwork leads to comparable or improved perceptual scores with drastically reduced computational complexity.
- Score: 42.01130983628078
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work aims to improve the efficiency of text-to-image diffusion models.
While diffusion models use computationally expensive UNet-based denoising
operations in every generation step, we identify that not all operations are
equally relevant for the final output quality. In particular, we observe that
UNet layers operating on high-res feature maps are relatively sensitive to
small perturbations. In contrast, low-res feature maps influence the semantic
layout of the final image and can often be perturbed with no noticeable change
in the output. Based on this observation, we propose Clockwork Diffusion, a
method that periodically reuses computation from preceding denoising steps to
approximate low-res feature maps at one or more subsequent steps. For multiple
baselines, and for both text-to-image generation and image editing, we
demonstrate that Clockwork leads to comparable or improved perceptual scores
with drastically reduced computational complexity. As an example, for Stable
Diffusion v1.5 with 8 DPM++ steps we save 32% of FLOPs with negligible FID and
CLIP change.
Related papers
- Ditto: Accelerating Diffusion Model via Temporal Value Similarity [4.5280087047319535]
We propose a difference processing algorithm that leverages temporal similarity with quantization to enhance the efficiency of diffusion models.
We also design the Ditto hardware, a specialized hardware accelerator, which achieves up to 1.5x speedup and 17.74% energy saving.
arXiv Detail & Related papers (2025-01-20T01:03:50Z) - SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time [7.532695984765271]
We present a novel approach to generate high-resolution images with generative models.
Our method shifts non-overlapping denoising windows over time, ensuring that seams in one timestep are corrected in the next.
Our method offers several key benefits, including improved computational efficiency and faster inference times.
arXiv Detail & Related papers (2024-07-22T09:44:35Z) - WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration [68.25711405944239]
Deep image registration has demonstrated exceptional accuracy and fast inference.
Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner.
We introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales.
arXiv Detail & Related papers (2024-07-18T11:51:01Z) - LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion [23.729378821117123]
Denoising Diffusion Probabilistic Model (DDPM) holds promise for low-light image enhancement in medical field.
DDPMs are computationally demanding and slow, limiting their practical medical applications.
We propose a lightweight DDPM, dubbed LighTDiff, to capture global structural information using low-resolution images.
arXiv Detail & Related papers (2024-05-17T05:31:19Z) - Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features.
We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps.
We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z) - Cache Me if You Can: Accelerating Diffusion Models through Block Caching [67.54820800003375]
A large image-to-image network has to be applied many times to iteratively refine an image from random noise.
We investigate the behavior of the layers within the network and find that 1) the layers' output changes smoothly over time, 2) the layers show distinct patterns of change, and 3) the change from step to step is often very small.
We propose a technique to automatically determine caching schedules based on each block's changes over timesteps.
arXiv Detail & Related papers (2023-12-06T00:51:38Z) - Improving Denoising Diffusion Models via Simultaneous Estimation of
Image and Noise [15.702941058218196]
This paper introduces two key contributions aimed at improving the speed and quality of images generated through inverse diffusion processes.
The first contribution involves re parameterizing the diffusion process in terms of the angle on a quarter-circular arc between the image and noise.
The second contribution is to directly estimate both the image ($mathbfx_0$) and noise ($mathbfepsilon$) using our network.
arXiv Detail & Related papers (2023-10-26T05:43:07Z) - Towards More Accurate Diffusion Model Acceleration with A Timestep
Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed.
We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost.
Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z) - Simultaneous Image-to-Zero and Zero-to-Noise: Diffusion Models with Analytical Image Attenuation [53.04220377034574]
We propose incorporating an analytical image attenuation process into the forward diffusion process for high-quality (un)conditioned image generation.
Our method represents the forward image-to-noise mapping as simultaneous textitimage-to-zero mapping and textitzero-to-noise mapping.
We have conducted experiments on unconditioned image generation, textite.g., CIFAR-10 and CelebA-HQ-256, and image-conditioned downstream tasks such as super-resolution, saliency detection, edge detection, and image inpainting.
arXiv Detail & Related papers (2023-06-23T18:08:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.