Related papers: Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction

Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction

URL: http://arxiv.org/abs/2505.20755v2
Date: Thu, 17 Jul 2025 13:16:31 GMT
Title: Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction
Authors: Yifei Wang, Weimin Bai, Colin Zhang, Debing Zhang, Weijian Luo, He Sun,
Abstract summary: Uni-Instruct is motivated by our proposed diffusion expansion theory of the $f$-divergence family.<n>On the CIFAR10 generation benchmark, Uni-Instruct achieves record-breaking Frechet Inception Distance (FID) values of textbfemph1.46 for unconditional generation.<n>On the ImageNet-$64times 64$ generation benchmark, Uni-Instruct achieves a new SoTA one-step generation FID of textbfemph1.02, which outperforms its 79-step teacher diffusion with a significant improvement margin of 1.33.
Score: 16.855296683335308
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we unify more than 10 existing one-step diffusion distillation approaches, such as Diff-Instruct, DMD, SIM, SiD, $f$-distill, etc, inside a theory-driven framework which we name the \textbf{\emph{Uni-Instruct}}. Uni-Instruct is motivated by our proposed diffusion expansion theory of the $f$-divergence family. Then we introduce key theories that overcome the intractability issue of the original expanded $f$-divergence, resulting in an equivalent yet tractable loss that effectively trains one-step diffusion models by minimizing the expanded $f$-divergence family. The novel unification introduced by Uni-Instruct not only offers new theoretical contributions that help understand existing approaches from a high-level perspective but also leads to state-of-the-art one-step diffusion generation performances. On the CIFAR10 generation benchmark, Uni-Instruct achieves record-breaking Frechet Inception Distance (FID) values of \textbf{\emph{1.46}} for unconditional generation and \textbf{\emph{1.38}} for conditional generation. On the ImageNet-$64\times 64$ generation benchmark, Uni-Instruct achieves a new SoTA one-step generation FID of \textbf{\emph{1.02}}, which outperforms its 79-step teacher diffusion with a significant improvement margin of 1.33 (1.02 vs 2.35). We also apply Uni-Instruct on broader tasks like text-to-3D generation. For text-to-3D generation, Uni-Instruct gives decent results, which slightly outperforms previous methods, such as SDS and VSD, in terms of both generation quality and diversity. Both the solid theoretical and empirical contributions of Uni-Instruct will potentially help future studies on one-step diffusion distillation and knowledge transferring of diffusion models.

Related papers

Ultra-Fast Language Generation via Discrete Diffusion Divergence Instruct [24.431216450821463]
DiDi-Instruct is a training-based method that distills a few-step student for fast generation.<n>On OpenWebText, DiDi-Instruct achieves perplexity from 62.2 (8 NFEs) to 18.4 (128 NFEs)<n>These gains come with a negligible entropy loss (around $1%$) and reduce additional training wall-clock time by more than $20times$ compared to competing dLLM distillation methods.
arXiv Detail & Related papers (2025-09-29T16:55:44Z)
DDAE++: Enhancing Diffusion Models Towards Unified Generative and Discriminative Learning [53.27049077100897]
generative pre-training has been shown to yield discriminative representations, paving the way towards unified visual generation and understanding.<n>This work introduces self-conditioning, a mechanism that internally leverages the rich semantics inherent in denoising network to guide its own decoding layers.<n>Results are compelling: our method boosts both generation FID and recognition accuracy with 1% computational overhead and generalizes across diverse diffusion architectures.
arXiv Detail & Related papers (2025-05-16T08:47:16Z)
Uni$\textbf{F}^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models [8.150431616220772]
We propose Uni$textbfF2$ace, the first UMM tailored specifically for fine-grained face understanding and generation.<n>In general, we train Uni$textbfF2$ace on a self-constructed, specialized dataset.<n>Experiments on Uni$textbfF2$ace-130K demonstrate that Uni$textbfF2$ace outperforms existing UMMs and generative models.
arXiv Detail & Related papers (2025-03-11T07:34:59Z)
Fast Direct: Query-Efficient Online Black-box Guidance for Diffusion-model Target Generation [27.773614349764234]
Existing guided diffusion models either rely on training the guidance model with pre-collected datasets or require the objective functions to be differentiable.<n>In this work, we propose a novel and simple algorithm, $textbfFast Direct$, for query-efficient online black-box target generation.<n>Our Fast Direct builds a pseudo-target on the data manifold to update the noise sequence of the diffusion model with a universal direction.
arXiv Detail & Related papers (2025-02-02T17:21:10Z)
Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models [8.352666876052616]
We introduce Diff-Instruct* (DI*), an image data-free approach for building one-step text-to-image generative models.<n>We frame human preference alignment as online reinforcement learning using human feedback.<n>Unlike traditional RLHF approaches, which rely on the KL divergence for regularization, we introduce a novel score-based divergence regularization.
arXiv Detail & Related papers (2024-10-28T10:26:19Z)
Unleashing the Power of One-Step Diffusion based Image Super-Resolution via a Large-Scale Diffusion Discriminator [81.81748032199813]
Diffusion models have demonstrated excellent performance for real-world image super-resolution (Real-ISR)<n>We propose a new One-Step textbfDiffusion model with a larger-scale textbfDiscriminator for SR.<n>Our discriminator is able to distill noisy features from any time step of diffusion models in the latent space.
arXiv Detail & Related papers (2024-10-05T16:41:36Z)
Score Forgetting Distillation: A Swift, Data-Free Method for Machine Unlearning in Diffusion Models [63.43422118066493]
Machine unlearning (MU) is a crucial foundation for developing safe, secure, and trustworthy GenAI models.<n>Traditional MU methods often rely on stringent assumptions and require access to real data.<n>This paper introduces Score Forgetting Distillation (SFD), an innovative MU approach that promotes the forgetting of undesirable information in diffusion models.
arXiv Detail & Related papers (2024-09-17T14:12:50Z)
Guided Diffusion from Self-Supervised Diffusion Features [49.78673164423208]
Guidance serves as a key concept in diffusion models, yet its effectiveness is often limited by the need for extra data annotation or pretraining. We propose a framework to extract guidance from, and specifically for, diffusion models.
arXiv Detail & Related papers (2023-12-14T11:19:11Z)
Towards Faster Non-Asymptotic Convergence for Diffusion-Based Generative Models [49.81937966106691]
We develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models. In contrast to prior works, our theory is developed based on an elementary yet versatile non-asymptotic approach.
arXiv Detail & Related papers (2023-06-15T16:30:08Z)
Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models [77.83923746319498]
We propose a framework called Diff-Instruct to instruct the training of arbitrary generative models. We show that Diff-Instruct results in state-of-the-art single-step diffusion-based models. Experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models.
arXiv Detail & Related papers (2023-05-29T04:22:57Z)
TESS: Text-to-Text Self-Conditioned Simplex Diffusion [56.881170312435444]
Text-to-text Self-conditioned Simplex Diffusion employs a new form of self-conditioning, and applies the diffusion process on the logit simplex space rather than the learned embedding space. We demonstrate that TESS outperforms state-of-the-art non-autoregressive models, requires fewer diffusion steps with minimal drop in performance, and is competitive with pretrained autoregressive sequence-to-sequence models.
arXiv Detail & Related papers (2023-05-15T06:33:45Z)

This list is automatically generated from the titles and abstracts of the papers in this site.