Related papers: Efficient Distillation of Classifier-Free Guidance using Adapters

Efficient Distillation of Classifier-Free Guidance using Adapters

URL: http://arxiv.org/abs/2503.07274v1
Date: Mon, 10 Mar 2025 12:55:08 GMT
Title: Efficient Distillation of Classifier-Free Guidance using Adapters
Authors: Cristian Perez Jensen, Seyedmorteza Sadat,
Abstract summary: adapter guidance distillation (AGD) is a novel approach that simulates CFG in a single forward pass.<n>AGD keeps the base model frozen and only trains minimal additional parameters.<n>We show that AGD achieves comparable or superior FID to CFG across multiple architectures.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: While classifier-free guidance (CFG) is essential for conditional diffusion models, it doubles the number of neural function evaluations (NFEs) per inference step. To mitigate this inefficiency, we introduce adapter guidance distillation (AGD), a novel approach that simulates CFG in a single forward pass. AGD leverages lightweight adapters to approximate CFG, effectively doubling the sampling speed while maintaining or even improving sample quality. Unlike prior guidance distillation methods that tune the entire model, AGD keeps the base model frozen and only trains minimal additional parameters ($\sim$2%) to significantly reduce the resource requirement of the distillation phase. Additionally, this approach preserves the original model weights and enables the adapters to be seamlessly combined with other checkpoints derived from the same base model. We also address a key mismatch between training and inference in existing guidance distillation methods by training on CFG-guided trajectories instead of standard diffusion trajectories. Through extensive experiments, we show that AGD achieves comparable or superior FID to CFG across multiple architectures with only half the NFEs. Notably, our method enables the distillation of large models ($\sim$2.6B parameters) on a single consumer GPU with 24 GB of VRAM, making it more accessible than previous approaches that require multiple high-end GPUs. We will publicly release the implementation of our method.

Related papers

Improving Classifier-Free Guidance of Flow Matching via Manifold Projection [3.6087998976768128]
We provide a principled interpretation of CFG through the lens of optimization.<n>We reformulate the CFG sampling as a homotopy optimization with manifold constraint.<n>Our proposed methods are training-free and consistently refine generation fidelity, prompt alignment, and robustness to the guidance scale.
arXiv Detail & Related papers (2026-01-29T15:49:31Z)
Joint Distillation for Fast Likelihood Evaluation and Sampling in Flow-based Models [100.28111930893188]
Some of today's best generative models still require hundreds to thousands of neural function evaluations to compute a single likelihood.<n>We present fast flow joint distillation (F2D2), a framework that simultaneously reduces the number of NFEs required for both sampling and likelihood evaluation by two orders of magnitude.<n>F2D2 is modular, compatible with existing flow-based few-step sampling models, and requires only an additional divergence prediction head.
arXiv Detail & Related papers (2025-12-02T10:48:20Z)
DiffusionNFT: Online Diffusion Reinforcement with Forward Process [99.94852379720153]
Diffusion Negative-aware FineTuning (DiffusionNFT) is a new online RL paradigm that optimize diffusion models directly on the forward process via flow matching.<n>DiffusionNFT is up to $25times$ more efficient than FlowGRPO in head-to-head comparisons, while being CFG-free.
arXiv Detail & Related papers (2025-09-19T16:09:33Z)
Conditional Diffusion Models with Classifier-Free Gibbs-like Guidance [19.83064246586143]
CFG is a technique for improving conditional diffusion models by linearly combining the outputs of conditional and unconditional denoisers.<n>While CFG enhances visual quality and improves alignment with prompts, it often reduces sample diversity.<n>We propose a Gibbs-like sampling procedure to draw samples from the desired tilted distribution.
arXiv Detail & Related papers (2025-05-27T12:27:33Z)
Few-Step Diffusion via Score identity Distillation [67.07985339442703]
Diffusion distillation has emerged as a promising strategy for accelerating text-to-image (T2I) diffusion models.<n>Existing methods rely on real or teacher-synthesized images to perform well when distilling high-resolution T2I diffusion models.<n>We propose two new guidance strategies: Zero-CFG, which disables CFG in the teacher and removes text conditioning in the fake score network, and Anti-CFG, which applies negative CFG in the fake score network.
arXiv Detail & Related papers (2025-05-19T03:45:16Z)
Adding Additional Control to One-Step Diffusion with Joint Distribution Matching [58.37264951734603]
JDM is a novel approach that minimizes the reverse KL divergence between image-condition joint distributions.<n>By deriving a tractable upper bound, JDM decouples fidelity learning from condition learning.<n>This asymmetric distillation scheme enables our one-step student to handle controls unknown to the teacher model.
arXiv Detail & Related papers (2025-03-09T15:06:50Z)
Diffusion Models without Classifier-free Guidance [41.59396565229466]
Model-guidance (MG) is a novel objective for training diffusion model addresses and removes commonly used guidance (CFG)<n>Our innovative approach transcends the standard modeling and incorporates the posterior probability of conditions.<n>Our method significantly accelerates the training process, doubles inference speed, and achieve exceptional quality that parallel surpass even concurrent diffusion models with CFG.
arXiv Detail & Related papers (2025-02-17T18:59:50Z)
Visual Generation Without Guidance [28.029707495420475]
We propose to build visual models that are free from sampling guided.<n>The resulting algorithm, Guidance-Free Training (GFT), matches the performance of CFG while reducing sampling to a single model, halving the cost.
arXiv Detail & Related papers (2025-01-26T06:48:05Z)
ALoRE: Efficient Visual Adaptation via Aggregating Low Rank Experts [71.91042186338163]
ALoRE is a novel PETL method that reuses the hypercomplex parameterized space constructed by Kronecker product to Aggregate Low Rank Experts.<n>Thanks to the artful design, ALoRE maintains negligible extra parameters and can be effortlessly merged into the frozen backbone.
arXiv Detail & Related papers (2024-12-11T12:31:30Z)
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models [27.640009920058187]
We revisit the CFG update rule and introduce modifications to address this issue. We propose down-weighting the parallel component to achieve high-quality generations without oversaturation. We also introduce a new rescaling momentum method for the CFG update rule based on this insight.
arXiv Detail & Related papers (2024-10-03T12:06:29Z)
Optimizing Diffusion Models for Joint Trajectory Prediction and Controllable Generation [49.49868273653921]
Diffusion models are promising for joint trajectory prediction and controllable generation in autonomous driving. We introduce Optimal Gaussian Diffusion (OGD) and Estimated Clean Manifold (ECM) Guidance. Our methodology streamlines the generative process, enabling practical applications with reduced computational overhead.
arXiv Detail & Related papers (2024-08-01T17:59:59Z)
Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications. Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space. We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z)
Adaptive Guidance: Training-free Acceleration of Conditional Diffusion Models [44.58960475893552]
"Adaptive Guidance" (AG) is an efficient variant of computation-Free Guidance (CFG) AG preserves CFG's image quality while reducing by 25%. " LinearAG" offers even cheaper inference at the cost of deviating from the baseline model.
arXiv Detail & Related papers (2023-12-19T17:08:48Z)
One-Step Diffusion Distillation via Deep Equilibrium Models [64.11782639697883]
We introduce a simple yet effective means of distilling diffusion models directly from initial noise to the resulting image. Our method enables fully offline training with just noise/image pairs from the diffusion model. We demonstrate that the DEQ architecture is crucial to this capability, as GET matches a $5times$ larger ViT in terms of FID scores.
arXiv Detail & Related papers (2023-12-12T07:28:40Z)
Post-Processing Temporal Action Detection [134.26292288193298]
Temporal Action Detection (TAD) methods typically take a pre-processing step in converting an input varying-length video into a fixed-length snippet representation sequence. This pre-processing step would temporally downsample the video, reducing the inference resolution and hampering the detection performance in the original temporal resolution. We introduce a novel model-agnostic post-processing method without model redesign and retraining.
arXiv Detail & Related papers (2022-11-27T19:50:37Z)

This list is automatically generated from the titles and abstracts of the papers in this site.