Gradient-Free Classifier Guidance for Diffusion Model Sampling
- URL: http://arxiv.org/abs/2411.15393v1
- Date: Sat, 23 Nov 2024 00:22:21 GMT
- Title: Gradient-Free Classifier Guidance for Diffusion Model Sampling
- Authors: Rahul Shenoy, Zhihong Pan, Kaushik Balakrishnan, Qisen Cheng, Yongmoon Jeon, Heejune Yang, Jaewon Kim,
- Abstract summary: Gradient-free Guidance (GFCG) method consistently improves class prediction accuracy.
For ImageNet 512$times$512, we achieve a record $FD_textDINOv2$ 23.09, while simultaneously attaining a higher classification Precision (94.3%) compared to ATG (90.2%)
- Score: 4.450496470631169
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Image generation using diffusion models have demonstrated outstanding learning capabilities, effectively capturing the full distribution of the training dataset. They are known to generate wide variations in sampled images, albeit with a trade-off in image fidelity. Guided sampling methods, such as classifier guidance (CG) and classifier-free guidance (CFG), focus sampling in well-learned high-probability regions to generate images of high fidelity, but each has its limitations. CG is computationally expensive due to the use of back-propagation for classifier gradient descent, while CFG, being gradient-free, is more efficient but compromises class label alignment compared to CG. In this work, we propose an efficient guidance method that fully utilizes a pre-trained classifier without using gradient descent. By using the classifier solely in inference mode, a time-adaptive reference class label and corresponding guidance scale are determined at each time step for guided sampling. Experiments on both class-conditioned and text-to-image generation diffusion models demonstrate that the proposed Gradient-free Classifier Guidance (GFCG) method consistently improves class prediction accuracy. We also show GFCG to be complementary to other guided sampling methods like CFG. When combined with the state-of-the-art Autoguidance (ATG), without additional computational overhead, it enhances image fidelity while preserving diversity. For ImageNet 512$\times$512, we achieve a record $\text{FD}_{\text{DINOv2}}$ of 23.09, while simultaneously attaining a higher classification Precision (94.3%) compared to ATG (90.2%)
Related papers
- Context-guided Responsible Data Augmentation with Diffusion Models [29.41191005466334]
We propose a text-to-image (T2I) data augmentation method, named DiffCoRe-Mix, that computes a set of generative counterparts for a training sample.
To preserve key semantic axes, we also filter out undesired generative samples in our augmentation process.
We extensively evaluate our technique on ImageNet-1K,Tiny ImageNet-200, CIFAR-100, Flowers102, CUB-Birds, Stanford Cars, and Caltech datasets.
arXiv Detail & Related papers (2025-03-12T00:12:27Z) - Efficient Distillation of Classifier-Free Guidance using Adapters [0.0]
adapter guidance distillation (AGD) is a novel approach that simulates CFG in a single forward pass.
AGD keeps the base model frozen and only trains minimal additional parameters.
We show that AGD achieves comparable or superior FID to CFG across multiple architectures.
arXiv Detail & Related papers (2025-03-10T12:55:08Z) - Diffusion Models without Classifier-free Guidance [41.59396565229466]
Model-guidance (MG) is a novel objective for training diffusion model addresses and removes commonly used guidance (CFG)
Our innovative approach transcends the standard modeling and incorporates the posterior probability of conditions.
Our method significantly accelerates the training process, doubles inference speed, and achieve exceptional quality that parallel surpass even concurrent diffusion models with CFG.
arXiv Detail & Related papers (2025-02-17T18:59:50Z) - Classifier-free Guidance with Adaptive Scaling [7.179513844921256]
Free guidance (CFG) is an essential mechanism in text-driven diffusion models.
In this paper, we present $beta$adaptive-CFG, which controls the impact of guidance during generation.
Our model obtained better FID scores, maintaining the text-to-image CLIP similarity scores at a level similar to that of the reference CFG.
arXiv Detail & Related papers (2025-02-14T22:04:53Z) - Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment [31.402736873762418]
Motivated by language model alignment methods, we propose textitCondition Contrastive Alignment (CCA) to facilitate guidance-free AR visual generation with high performance.
Experimental results show that CCA can significantly enhance the guidance-free performance of all tested models with just one epoch fine-tuning.
This experimentally confirms the strong theoretical connection between language-targeted alignment and visual-targeted guidance methods.
arXiv Detail & Related papers (2024-10-12T03:31:25Z) - Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation [62.30570286073223]
Diffusion-based text-to-image generation models have demonstrated the ability to produce images aligned with textual descriptions.
We introduce a data-free guided distillation method that enables the efficient distillation of pretrained Diffusion models without access to the real training data.
By exclusively training with synthetic images generated by its one-step generator, our data-free distillation method rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score.
arXiv Detail & Related papers (2024-06-03T17:44:11Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint
Method [110.9458914721516]
We propose Symplectic Adjoint Guidance (SAG), which calculates the gradient guidance in two inner stages.
SAG generates images with higher qualities compared to the baselines in both guided image and video generation tasks.
arXiv Detail & Related papers (2023-12-19T10:30:31Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - End-to-End Diffusion Latent Optimization Improves Classifier Guidance [81.27364542975235]
Direct Optimization of Diffusion Latents (DOODL) is a novel guidance method.
It enables plug-and-play guidance by optimizing diffusion latents.
It outperforms one-step classifier guidance on computational and human evaluation metrics.
arXiv Detail & Related papers (2023-03-23T22:43:52Z) - Accelerating Diffusion Sampling with Classifier-based Feature
Distillation [20.704675568555082]
Progressive distillation is proposed for fast sampling by progressively aligning output images of $N$-step teacher sampler with $N/2$-step student sampler.
We distill teacher's sharpened feature distribution into the student with a dataset-independent classifier to improve performance.
Experiments on CIFAR-10 show the superiority of our method in achieving high quality and fast sampling.
arXiv Detail & Related papers (2022-11-22T06:21:31Z) - Optimizing Hierarchical Image VAEs for Sample Quality [0.0]
hierarchical variational autoencoders (VAEs) have achieved great density estimation on image modeling tasks.
We attribute this to learned representations that over-emphasize compressing imperceptible details of the image.
We introduce a KL-reweighting strategy to control the amount of infor mation in each latent group, and employ a Gaussian output layer to reduce sharpness in the learning objective.
arXiv Detail & Related papers (2022-10-18T23:10:58Z) - Classifier-Free Diffusion Guidance [17.355749359987648]
guidance is a recently introduced method of trade off mode coverage and sample fidelity in conditional diffusion models.
We show that guidance can be indeed performed by a pure generative model without such a classifier.
We combine the resulting conditional and unconditional score estimates to attain a trade-off between sample quality and diversity.
arXiv Detail & Related papers (2022-07-26T01:42:07Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Diverse Image Generation via Self-Conditioned GANs [56.91974064348137]
We train a class-conditional GAN model without using manually annotated class labels.
Instead, our model is conditional on labels automatically derived from clustering in the discriminator's feature space.
Our clustering step automatically discovers diverse modes, and explicitly requires the generator to cover them.
arXiv Detail & Related papers (2020-06-18T17:56:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.