From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model
- URL: http://arxiv.org/abs/2511.08930v1
- Date: Thu, 13 Nov 2025 01:18:51 GMT
- Title: From Structure to Detail: Hierarchical Distillation for Efficient Diffusion Model
- Authors: Hanbo Cheng, Peng Wang, Kaixiang Lei, Qi Li, Zhen Zou, Pengfei Hu, Jun Du,
- Abstract summary: Trajectory-based and distribution-based step distillation methods offer solutions.<n>Trajectory-based methods preserve global structure but act as a "lossy compressor"<n>We recast them into synergistic components within our novel Hierarchical Distillation framework.
- Score: 18.782919607372328
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The inference latency of diffusion models remains a critical barrier to their real-time application. While trajectory-based and distribution-based step distillation methods offer solutions, they present a fundamental trade-off. Trajectory-based methods preserve global structure but act as a "lossy compressor", sacrificing high-frequency details. Conversely, distribution-based methods can achieve higher fidelity but often suffer from mode collapse and unstable training. This paper recasts them from independent paradigms into synergistic components within our novel Hierarchical Distillation (HD) framework. We leverage trajectory distillation not as a final generator, but to establish a structural ``sketch", providing a near-optimal initialization for the subsequent distribution-based refinement stage. This strategy yields an ideal initial distribution that enhances the ceiling of overall performance. To further improve quality, we introduce and refine the adversarial training process. We find standard discriminator structures are ineffective at refining an already high-quality generator. To overcome this, we introduce the Adaptive Weighted Discriminator (AWD), tailored for the HD pipeline. By dynamically allocating token weights, AWD focuses on local imperfections, enabling efficient detail refinement. Our approach demonstrates state-of-the-art performance across diverse tasks. On ImageNet $256\times256$, our single-step model achieves an FID of 2.26, rivaling its 250-step teacher. It also achieves promising results on the high-resolution text-to-image MJHQ benchmark, proving its generalizability. Our method establishes a robust new paradigm for high-fidelity, single-step diffusion models.
Related papers
- Dual-End Consistency Model [41.982957134224904]
Slow iterative sampling is a major bottleneck for the practical deployment of diffusion and flow-based generative models.<n>We propose a Dual-End Consistency Model (DE-CM) that selects vital sub-trajectory clusters to achieve stable and effective training.<n>Our method achieves a state-of-the-art FID score of 1.70 in one-step generation on the ImageNet 256x256 dataset, outperforming existing CM-based one-step approaches.
arXiv Detail & Related papers (2026-02-11T11:51:01Z) - Deep Leakage with Generative Flow Matching Denoiser [54.05993847488204]
We introduce a new deep leakage (DL) attack that integrates a generative Flow Matching (FM) prior into the reconstruction process.<n>Our approach consistently outperforms state-of-the-art attacks across pixel-level, perceptual, and feature-based similarity metrics.
arXiv Detail & Related papers (2026-01-21T14:51:01Z) - Decoupled DMD: CFG Augmentation as the Spear, Distribution Matching as the Shield [54.328202401611264]
Diffusion model distillation has emerged as a powerful technique for creating efficient few-step and single-step generators.<n>We show that the primary driver of few-step distillation is not distribution matching, but a previously overlooked component we identify as CFG Augmentation (CA)<n>We propose principled modifications to the distillation process, such as decoupling the noise schedules for the engine and the regularizer, leading to further performance gains.
arXiv Detail & Related papers (2025-11-27T18:24:28Z) - Improving Progressive Generation with Decomposable Flow Matching [50.63174319509629]
Decomposable Flow Matching (DFM) is a simple and effective framework for the progressive generation of visual media.<n>On Imagenet-1k 512px, DFM achieves 35.2% improvements in FDD scores over the base architecture and 26.4% over the best-performing baseline.
arXiv Detail & Related papers (2025-06-24T17:58:02Z) - One-Step Offline Distillation of Diffusion-based Models via Koopman Modeling [26.913398550088477]
We introduce the Koopman Distillation Model (KDM), a novel offline distillation approach grounded in Koopman theory.<n>KDM encodes noisy inputs into an embedded space where a learned linear operator propagates them forward, followed by a decoder that reconstructs clean samples.<n>KDM achieves highly competitive performance across standard offline distillation benchmarks.
arXiv Detail & Related papers (2025-05-19T16:59:47Z) - Efficient Generative Model Training via Embedded Representation Warmup [12.485320863366411]
Generative models face a fundamental challenge: they must simultaneously learn high-level semantic concepts and low-level synthesis details.<n>We propose Embedded Representation Warmup, a principled two-phase training framework.<n>Our framework achieves a 11.5$times$ speedup in 350 epochs to reach FID=1.41 compared to single-phase methods like REPA.
arXiv Detail & Related papers (2025-04-14T12:43:17Z) - Denoising Score Distillation: From Noisy Diffusion Pretraining to One-Step High-Quality Generation [82.39763984380625]
We introduce denoising score distillation (DSD), a surprisingly effective and novel approach for training high-quality generative models from low-quality data.<n>DSD pretrains a diffusion model exclusively on noisy, corrupted samples and then distills it into a one-step generator capable of producing refined, clean outputs.
arXiv Detail & Related papers (2025-03-10T17:44:46Z) - One-Step Diffusion Model for Image Motion-Deblurring [85.76149042561507]
We propose a one-step diffusion model for deblurring (OSDD), a novel framework that reduces the denoising process to a single step.<n>To tackle fidelity loss in diffusion models, we introduce an enhanced variational autoencoder (eVAE), which improves structural restoration.<n>Our method achieves strong performance on both full and no-reference metrics.
arXiv Detail & Related papers (2025-03-09T09:39:57Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.