Related papers: Accelerate High-Quality Diffusion Models with Inner Loop Feedback

Accelerate High-Quality Diffusion Models with Inner Loop Feedback

URL: http://arxiv.org/abs/2501.13107v2
Date: Thu, 23 Jan 2025 18:13:35 GMT
Title: Accelerate High-Quality Diffusion Models with Inner Loop Feedback
Authors: Matthew Gwilliam, Han Cai, Di Wu, Abhinav Shrivastava, Zhiyu Cheng,
Abstract summary: Inner Loop Feedback (ILF) is a novel approach to accelerate diffusion models' inference.<n>ILF trains a lightweight module to predict future features in the denoising process.<n>ILF achieves strong performance for both class-to-image generation with diffusion transformer (DiT) and text-to-image generation with DiT-based PixArt-alpha and PixArt-sigma.
Score: 50.00066451431194
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We propose Inner Loop Feedback (ILF), a novel approach to accelerate diffusion models' inference. ILF trains a lightweight module to predict future features in the denoising process by leveraging the outputs from a chosen diffusion backbone block at a given time step. This approach exploits two key intuitions; (1) the outputs of a given block at adjacent time steps are similar, and (2) performing partial computations for a step imposes a lower burden on the model than skipping the step entirely. Our method is highly flexible, since we find that the feedback module itself can simply be a block from the diffusion backbone, with all settings copied. Its influence on the diffusion forward can be tempered with a learnable scaling factor from zero initialization. We train this module using distillation losses; however, unlike some prior work where a full diffusion backbone serves as the student, our model freezes the backbone, training only the feedback module. While many efforts to optimize diffusion models focus on achieving acceptable image quality in extremely few steps (1-4 steps), our emphasis is on matching best case results (typically achieved in 20 steps) while significantly reducing runtime. ILF achieves this balance effectively, demonstrating strong performance for both class-to-image generation with diffusion transformer (DiT) and text-to-image generation with DiT-based PixArt-alpha and PixArt-sigma. The quality of ILF's 1.7x-1.8x speedups are confirmed by FID, CLIP score, CLIP Image Quality Assessment, ImageReward, and qualitative comparisons. Project information is available at https://mgwillia.github.io/ilf.

Related papers

One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step. To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration. Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z)
ProReflow: Progressive Reflow with Decomposed Velocity [52.249464542399636]
Flow matching aims to reflow the diffusion process of diffusion models into a straight line for a few-step and even one-step generation. We introduce progressive reflow, which progressively reflows the diffusion models in local timesteps until the whole diffusion progresses. We also introduce aligned v-prediction, which highlights the importance of direction matching in flow matching over magnitude matching.
arXiv Detail & Related papers (2025-03-05T04:50:53Z)
One Diffusion Step to Real-World Super-Resolution via Flow Trajectory Distillation [60.54811860967658]
FluxSR is a novel one-step diffusion Real-ISR based on flow matching models. First, we introduce Flow Trajectory Distillation (FTD) to distill a multi-step flow matching model into a one-step Real-ISR. Second, to improve image realism and address high-frequency artifact issues in generated images, we propose TV-LPIPS as a perceptual loss.
arXiv Detail & Related papers (2025-02-04T04:11:29Z)
FAM Diffusion: Frequency and Attention Modulation for High-Resolution Image Generation with Stable Diffusion [63.609399000712905]
Inference at a scaled resolution leads to repetitive patterns and structural distortions.<n>We propose two simple modules that combine to solve these issues.<n>Our method, coined Fam diffusion, can seamlessly integrate into any latent diffusion model and requires no additional training.
arXiv Detail & Related papers (2024-11-27T17:51:44Z)
David and Goliath: Small One-step Model Beats Large Diffusion with Score Post-training [8.352666876052616]
We propose Diff-Instruct* (DI*), a data-efficient post-training approach for one-step text-to-image generative models.<n>Our method frames alignment as online reinforcement learning from human feedback.<n>Our 2.6B emphDI*-SDXL-1step model outperforms the 50-step 12B FLUX-dev model.
arXiv Detail & Related papers (2024-10-28T10:26:19Z)
Fast constrained sampling in pre-trained diffusion models [77.21486516041391]
We propose an algorithm that enables fast and high-quality generation under arbitrary constraints. During inference, we can interchange between gradient updates computed on the noisy image and updates computed on the final, clean image. Our approach produces results that rival or surpass the state-of-the-art training-free inference approaches.
arXiv Detail & Related papers (2024-10-24T14:52:38Z)
Tuning Timestep-Distilled Diffusion Model Using Pairwise Sample Optimization [97.35427957922714]
We present an algorithm named pairwise sample optimization (PSO), which enables the direct fine-tuning of an arbitrary timestep-distilled diffusion model. PSO introduces additional reference images sampled from the current time-step distilled model, and increases the relative likelihood margin between the training images and reference images. We show that PSO can directly adapt distilled models to human-preferred generation with both offline and online-generated pairwise preference image data.
arXiv Detail & Related papers (2024-10-04T07:05:16Z)
Identifying and Solving Conditional Image Leakage in Image-to-Video Diffusion Model [31.70050311326183]
Diffusion models tend to generate videos with less motion than expected. We address this issue from both inference and training aspects. Our methods outperform baselines by producing higher motion scores with lower errors.
arXiv Detail & Related papers (2024-06-22T04:56:16Z)
Plug-and-Play Diffusion Distillation [14.359953671470242]
We propose a new distillation approach for guided diffusion models. An external lightweight guide model is trained while the original text-to-image model remains frozen. We show that our method reduces the inference of classifier-free guided latent-space diffusion models by almost half.
arXiv Detail & Related papers (2024-06-04T04:22:47Z)
Blind Image Restoration via Fast Diffusion Inversion [17.139433082780037]
Blind Image Restoration via fast Diffusion (BIRD) is a blind IR method that jointly optimize for the degradation model parameters and the restored image. A key idea in our method is not to modify the reverse sampling, i.e., not to alter all the intermediate latents, once an initial noise is sampled. We experimentally validate BIRD on several image restoration tasks and show that it achieves state of the art performance on all of them.
arXiv Detail & Related papers (2024-05-29T23:38:12Z)
You Only Sample Once: Taming One-Step Text-to-Image Synthesis by Self-Cooperative Diffusion GANs [13.133574069588896]
YOSO is a novel generative model designed for rapid, scalable, and high-fidelity one-step image synthesis with high training stability and mode coverage. We show that our method can serve as a one-step generation model training from scratch with competitive performance. In particular, we show that the YOSO-PixArt-$alpha$ can generate images in one step trained on 512 resolution, with the capability of adapting to 1024 resolution without extra explicit training, requiring only 10 A800 days for fine-tuning.
arXiv Detail & Related papers (2024-03-19T17:34:27Z)
One-Step Image Translation with Text-to-Image Models [35.0987002313882]
We introduce a general method for adapting a single-step diffusion model to new tasks and domains through adversarial learning objectives. We consolidate various modules of the vanilla latent diffusion model into a single end-to-end generator network with small trainable weights. Our model CycleGAN-Turbo outperforms existing GAN-based and diffusion-based methods for various scene translation tasks.
arXiv Detail & Related papers (2024-03-18T17:59:40Z)
Prompt-tuning latent diffusion models for inverse problems [72.13952857287794]
We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors. Our method, called P2L, outperforms both image- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting.
arXiv Detail & Related papers (2023-10-02T11:31:48Z)
Simultaneous Image-to-Zero and Zero-to-Noise: Diffusion Models with Analytical Image Attenuation [53.04220377034574]
We propose incorporating an analytical image attenuation process into the forward diffusion process for high-quality (un)conditioned image generation.<n>Our method represents the forward image-to-noise mapping as simultaneous textitimage-to-zero mapping and textitzero-to-noise mapping.<n>We have conducted experiments on unconditioned image generation, textite.g., CIFAR-10 and CelebA-HQ-256, and image-conditioned downstream tasks such as super-resolution, saliency detection, edge detection, and image inpainting.
arXiv Detail & Related papers (2023-06-23T18:08:00Z)
SnapFusion: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds [88.06788636008051]
Text-to-image diffusion models can create stunning images from natural language descriptions that rival the work of professional artists and photographers. These models are large, with complex network architectures and tens of denoising iterations, making them computationally expensive and slow to run. We present a generic approach that unlocks running text-to-image diffusion models on mobile devices in less than $2$ seconds.
arXiv Detail & Related papers (2023-06-01T17:59:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.