Hyperparameters are all you need: Using five-step inference for an original diffusion model to generate images comparable to the latest distillation model
- URL: http://arxiv.org/abs/2510.02390v1
- Date: Tue, 30 Sep 2025 23:27:09 GMT
- Title: Hyperparameters are all you need: Using five-step inference for an original diffusion model to generate images comparable to the latest distillation model
- Authors: Zilai Li,
- Abstract summary: The diffusion model is a state-of-the-art generative model that generates an image by applying a neural network iteratively.<n>Based on the analysis of the truncation error of the diffusion ODE and SDE, our study proposes a training-free algorithm that generates high-quality 512 x 512 and 1024 x 1024 images in eight steps.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The diffusion model is a state-of-the-art generative model that generates an image by applying a neural network iteratively. Moreover, this generation process is regarded as an algorithm solving an ordinary differential equation or a stochastic differential equation. Based on the analysis of the truncation error of the diffusion ODE and SDE, our study proposes a training-free algorithm that generates high-quality 512 x 512 and 1024 x 1024 images in eight steps, with flexible guidance scales. To the best of my knowledge, our algorithm is the first one that samples a 1024 x 1024 resolution image in 8 steps with an FID performance comparable to that of the latest distillation model, but without additional training. Meanwhile, our algorithm can also generate a 512 x 512 image in 8 steps, and its FID performance is better than the inference result using state-of-the-art ODE solver DPM++ 2m in 20 steps. We validate our eight-step image generation algorithm using the COCO 2014, COCO 2017, and LAION datasets. And our best FID performance is 15.7, 22.35, and 17.52. While the FID performance of DPM++2m is 17.3, 23.75, and 17.33. Further, it also outperforms the state-of-the-art AMED-plugin solver, whose FID performance is 19.07, 25.50, and 18.06. We also apply the algorithm in five-step inference without additional training, for which the best FID performance in the datasets mentioned above is 19.18, 23.24, and 19.61, respectively, and is comparable to the performance of the state-of-the-art AMED Pulgin solver in eight steps, SDXL-turbo in four steps, and the state-of-the-art diffusion distillation model Flash Diffusion in five steps. We also validate our algorithm in synthesizing 1024 * 1024 images within 6 steps, whose FID performance only has a limited distance to the latest distillation algorithm. The code is in repo: https://github.com/TheLovesOfLadyPurple/Hyperparameters-are-all-you-need
Related papers
- LowDiff: Efficient Diffusion Sampling with Low-Resolution Condition [12.702798486507225]
LowDiff is a novel and efficient diffusion framework based on a cascaded approach.<n>LowDiff employs a unified model to progressively refine images from low resolution to the desired resolution.
arXiv Detail & Related papers (2025-09-18T18:31:56Z) - Distilling Parallel Gradients for Fast ODE Solvers of Diffusion Models [53.087070073434845]
Diffusion models (DMs) have achieved state-of-the-art generative performance but suffer from high sampling latency due to their sequential denoising nature.<n>Existing solver-based acceleration methods often face image quality degradation under a low-latency budget.<n>We propose the Ensemble Parallel Direction solver (dubbed as ours), a novel ODE solver that mitigates truncation errors by incorporating multiple parallel gradient evaluations in each ODE step.
arXiv Detail & Related papers (2025-07-20T03:08:06Z) - Autoregressive Distillation of Diffusion Transformers [18.19070958829772]
We propose AutoRegressive Distillation (ARD), a novel approach that leverages the historical trajectory of the ODE to predict future steps.<n>ARD offers two key benefits: 1) it mitigates exposure bias by utilizing a predicted historical trajectory that is less susceptible to accumulated errors, and 2) it leverages the previous history of the ODE trajectory as a more effective source of coarse-grained information.<n>Our model achieves a $5times$ reduction in FID degradation compared to the baseline methods while requiring only 1.1% extra FLOPs on ImageNet-256.
arXiv Detail & Related papers (2025-04-15T15:33:49Z) - Adversarial Score identity Distillation: Rapidly Surpassing the Teacher in One Step [64.53013367995325]
We introduce SiDA (SiD with Adversarial Loss), which improves generation quality and distillation efficiency.<n>SiDA incorporates real images and adversarial loss, allowing it to distinguish between real images and those generated by SiD.<n>SiDA converges significantly faster than its predecessor when distilled from scratch.
arXiv Detail & Related papers (2024-10-19T00:33:51Z) - An Image is Worth 32 Tokens for Reconstruction and Generation [54.24414696392026]
Transformer-based 1-Dimensional Tokenizer (TiTok) is an innovative approach that tokenizes images into 1D latent sequences.
TiTok achieves competitive performance to state-of-the-art approaches.
Our best-performing variant can significantly surpasses DiT-XL/2 (gFID 2.13 vs. 3.04) while still generating high-quality samples 74x faster.
arXiv Detail & Related papers (2024-06-11T17:59:56Z) - Diffusion Models Are Innate One-Step Generators [2.3359837623080613]
Diffusion Models (DMs) can generate remarkable high-quality results.
DMs' layers are differentially activated at different time steps, leading to an inherent capability to generate images in a single step.
Our method achieves the SOTA results on CIFAR-10, AFHQv2 64x64 (FID 1.23), FFHQ 64x64 (FID 0.85) and ImageNet 64x64 (FID 1.16) with great efficiency.
arXiv Detail & Related papers (2024-05-31T11:14:12Z) - One-step Diffusion with Distribution Matching Distillation [54.723565605974294]
We introduce Distribution Matching Distillation (DMD), a procedure to transform a diffusion model into a one-step image generator.
We enforce the one-step image generator match the diffusion model at distribution level, by minimizing an approximate KL divergence.
Our method outperforms all published few-step diffusion approaches, reaching 2.62 FID on ImageNet 64x64 and 11.49 FID on zero-shot COCO-30k.
arXiv Detail & Related papers (2023-11-30T18:59:20Z) - ACT-Diffusion: Efficient Adversarial Consistency Training for One-step Diffusion Models [59.90959789767886]
We show that optimizing consistency training loss minimizes the Wasserstein distance between target and generated distributions.
By incorporating a discriminator into the consistency training framework, our method achieves improved FID scores on CIFAR10 and ImageNet 64$times$64 and LSUN Cat 256$times$256 datasets.
arXiv Detail & Related papers (2023-11-23T16:49:06Z) - AutoDiffusion: Training-Free Optimization of Time Steps and
Architectures for Automated Diffusion Model Acceleration [57.846038404893626]
We propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training.
Experimental results show that our method achieves excellent performance by using only a few time steps, e.g. 17.86 FID score on ImageNet 64 $times$ 64 with only four steps, compared to 138.66 with DDIM.
arXiv Detail & Related papers (2023-09-19T08:57:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.