Related papers: Clockwork Diffusion: Efficient Generation With Model-Step Distillation

Clockwork Diffusion: Efficient Generation With Model-Step Distillation

URL: http://arxiv.org/abs/2312.08128v2
Date: Tue, 20 Feb 2024 14:50:23 GMT
Title: Clockwork Diffusion: Efficient Generation With Model-Step Distillation
Authors: Amirhossein Habibian, Amir Ghodrati, Noor Fathima, Guillaume Sautiere, Risheek Garrepalli, Fatih Porikli, Jens Petersen
Abstract summary: Clockwork Diffusion is a method that periodically reuses computation from preceding denoising steps to approximate low-res feature maps at one or more subsequent steps. For both text-to-image generation and image editing, we demonstrate that Clockwork leads to comparable or improved perceptual scores with drastically reduced computational complexity.
Score: 42.01130983628078
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work aims to improve the efficiency of text-to-image diffusion models. While diffusion models use computationally expensive UNet-based denoising operations in every generation step, we identify that not all operations are equally relevant for the final output quality. In particular, we observe that UNet layers operating on high-res feature maps are relatively sensitive to small perturbations. In contrast, low-res feature maps influence the semantic layout of the final image and can often be perturbed with no noticeable change in the output. Based on this observation, we propose Clockwork Diffusion, a method that periodically reuses computation from preceding denoising steps to approximate low-res feature maps at one or more subsequent steps. For multiple baselines, and for both text-to-image generation and image editing, we demonstrate that Clockwork leads to comparable or improved perceptual scores with drastically reduced computational complexity. As an example, for Stable Diffusion v1.5 with 8 DPM++ steps we save 32% of FLOPs with negligible FID and CLIP change.

Related papers

Ditto: Accelerating Diffusion Model via Temporal Value Similarity [4.5280087047319535]
We propose a difference processing algorithm that leverages temporal similarity with quantization to enhance the efficiency of diffusion models. We also design the Ditto hardware, a specialized hardware accelerator, which achieves up to 1.5x speedup and 17.74% energy saving.
arXiv Detail & Related papers (2025-01-20T01:03:50Z)
Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
We propose an algorithm that enables fast and high-quality generation under arbitrary constraints. During inference, we can interchange between gradient updates computed on the noisy image and updates computed on the final, clean image. Our approach produces results that rival or surpass the state-of-the-art training-free inference approaches.
arXiv Detail & Related papers (2024-10-24T14:52:38Z)
SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time [7.532695984765271]
We present a novel approach to generate high-resolution images with generative models. Our method shifts non-overlapping denoising windows over time, ensuring that seams in one timestep are corrected in the next. Our method offers several key benefits, including improved computational efficiency and faster inference times.
arXiv Detail & Related papers (2024-07-22T09:44:35Z)
WiNet: Wavelet-based Incremental Learning for Efficient Medical Image Registration [68.25711405944239]
Deep image registration has demonstrated exceptional accuracy and fast inference. Recent advances have adopted either multiple cascades or pyramid architectures to estimate dense deformation fields in a coarse-to-fine manner. We introduce a model-driven WiNet that incrementally estimates scale-wise wavelet coefficients for the displacement/velocity field across various scales.
arXiv Detail & Related papers (2024-07-18T11:51:01Z)
LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-Diffusion [23.729378821117123]
Denoising Diffusion Probabilistic Model (DDPM) holds promise for low-light image enhancement in medical field. DDPMs are computationally demanding and slow, limiting their practical medical applications. We propose a lightweight DDPM, dubbed LighTDiff, to capture global structural information using low-resolution images.
arXiv Detail & Related papers (2024-05-17T05:31:19Z)
Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features. We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps. We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z)
Cache Me if You Can: Accelerating Diffusion Models through Block Caching [67.54820800003375]
A large image-to-image network has to be applied many times to iteratively refine an image from random noise. We investigate the behavior of the layers within the network and find that 1) the layers' output changes smoothly over time, 2) the layers show distinct patterns of change, and 3) the change from step to step is often very small. We propose a technique to automatically determine caching schedules based on each block's changes over timesteps.
arXiv Detail & Related papers (2023-12-06T00:51:38Z)
Improving Denoising Diffusion Models via Simultaneous Estimation of Image and Noise [15.702941058218196]
This paper introduces two key contributions aimed at improving the speed and quality of images generated through inverse diffusion processes. The first contribution involves re parameterizing the diffusion process in terms of the angle on a quarter-circular arc between the image and noise. The second contribution is to directly estimate both the image ($mathbfx_0$) and noise ($mathbfepsilon$) using our network.
arXiv Detail & Related papers (2023-10-26T05:43:07Z)
Towards More Accurate Diffusion Model Acceleration with A Timestep Aligner [84.97253871387028]
A diffusion model, which is formulated to produce an image using thousands of denoising steps, usually suffers from a slow inference speed. We propose a timestep aligner that helps find a more accurate integral direction for a particular interval at the minimum cost. Experiments show that our plug-in design can be trained efficiently and boost the inference performance of various state-of-the-art acceleration methods.
arXiv Detail & Related papers (2023-10-14T02:19:07Z)
Gradpaint: Gradient-Guided Inpainting with Diffusion Models [71.47496445507862]
Denoising Diffusion Probabilistic Models (DDPMs) have recently achieved remarkable results in conditional and unconditional image generation. We present GradPaint, which steers the generation towards a globally coherent image. We generalizes well to diffusion models trained on various datasets, improving upon current state-of-the-art supervised and unsupervised methods.
arXiv Detail & Related papers (2023-09-18T09:36:24Z)
Effective Real Image Editing with Accelerated Iterative Diffusion Inversion [6.335245465042035]
It is still challenging to edit and manipulate natural images with modern generative models. Existing approaches that have tackled the problem of inversion stability often incur in significant trade-offs in computational efficiency. We propose an Accelerated Iterative Diffusion Inversion method, dubbed AIDI, that significantly improves reconstruction accuracy with minimal additional overhead in space and time complexity.
arXiv Detail & Related papers (2023-09-10T01:23:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.