Gradient-Free Classifier Guidance for Diffusion Model Sampling
- URL: http://arxiv.org/abs/2411.15393v1
- Date: Sat, 23 Nov 2024 00:22:21 GMT
- Title: Gradient-Free Classifier Guidance for Diffusion Model Sampling
- Authors: Rahul Shenoy, Zhihong Pan, Kaushik Balakrishnan, Qisen Cheng, Yongmoon Jeon, Heejune Yang, Jaewon Kim,
- Abstract summary: Gradient-free Guidance (GFCG) method consistently improves class prediction accuracy.
For ImageNet 512$times$512, we achieve a record $FD_textDINOv2$ 23.09, while simultaneously attaining a higher classification Precision (94.3%) compared to ATG (90.2%)
- Score: 4.450496470631169
- License:
- Abstract: Image generation using diffusion models have demonstrated outstanding learning capabilities, effectively capturing the full distribution of the training dataset. They are known to generate wide variations in sampled images, albeit with a trade-off in image fidelity. Guided sampling methods, such as classifier guidance (CG) and classifier-free guidance (CFG), focus sampling in well-learned high-probability regions to generate images of high fidelity, but each has its limitations. CG is computationally expensive due to the use of back-propagation for classifier gradient descent, while CFG, being gradient-free, is more efficient but compromises class label alignment compared to CG. In this work, we propose an efficient guidance method that fully utilizes a pre-trained classifier without using gradient descent. By using the classifier solely in inference mode, a time-adaptive reference class label and corresponding guidance scale are determined at each time step for guided sampling. Experiments on both class-conditioned and text-to-image generation diffusion models demonstrate that the proposed Gradient-free Classifier Guidance (GFCG) method consistently improves class prediction accuracy. We also show GFCG to be complementary to other guided sampling methods like CFG. When combined with the state-of-the-art Autoguidance (ATG), without additional computational overhead, it enhances image fidelity while preserving diversity. For ImageNet 512$\times$512, we achieve a record $\text{FD}_{\text{DINOv2}}$ of 23.09, while simultaneously attaining a higher classification Precision (94.3%) compared to ATG (90.2%)
Related papers
- Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment [31.402736873762418]
Motivated by language model alignment methods, we propose textitCondition Contrastive Alignment (CCA) to facilitate guidance-free AR visual generation with high performance.
Experimental results show that CCA can significantly enhance the guidance-free performance of all tested models with just one epoch fine-tuning.
This experimentally confirms the strong theoretical connection between language-targeted alignment and visual-targeted guidance methods.
arXiv Detail & Related papers (2024-10-12T03:31:25Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - Towards Accurate Guided Diffusion Sampling through Symplectic Adjoint
Method [110.9458914721516]
We propose Symplectic Adjoint Guidance (SAG), which calculates the gradient guidance in two inner stages.
SAG generates images with higher qualities compared to the baselines in both guided image and video generation tasks.
arXiv Detail & Related papers (2023-12-19T10:30:31Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - End-to-End Diffusion Latent Optimization Improves Classifier Guidance [81.27364542975235]
Direct Optimization of Diffusion Latents (DOODL) is a novel guidance method.
It enables plug-and-play guidance by optimizing diffusion latents.
It outperforms one-step classifier guidance on computational and human evaluation metrics.
arXiv Detail & Related papers (2023-03-23T22:43:52Z) - Accelerating Diffusion Sampling with Classifier-based Feature
Distillation [20.704675568555082]
Progressive distillation is proposed for fast sampling by progressively aligning output images of $N$-step teacher sampler with $N/2$-step student sampler.
We distill teacher's sharpened feature distribution into the student with a dataset-independent classifier to improve performance.
Experiments on CIFAR-10 show the superiority of our method in achieving high quality and fast sampling.
arXiv Detail & Related papers (2022-11-22T06:21:31Z) - Optimizing Hierarchical Image VAEs for Sample Quality [0.0]
hierarchical variational autoencoders (VAEs) have achieved great density estimation on image modeling tasks.
We attribute this to learned representations that over-emphasize compressing imperceptible details of the image.
We introduce a KL-reweighting strategy to control the amount of infor mation in each latent group, and employ a Gaussian output layer to reduce sharpness in the learning objective.
arXiv Detail & Related papers (2022-10-18T23:10:58Z) - Classifier-Free Diffusion Guidance [17.355749359987648]
guidance is a recently introduced method of trade off mode coverage and sample fidelity in conditional diffusion models.
We show that guidance can be indeed performed by a pure generative model without such a classifier.
We combine the resulting conditional and unconditional score estimates to attain a trade-off between sample quality and diversity.
arXiv Detail & Related papers (2022-07-26T01:42:07Z) - No Fear of Heterogeneity: Classifier Calibration for Federated Learning
with Non-IID Data [78.69828864672978]
A central challenge in training classification models in the real-world federated system is learning with non-IID data.
We propose a novel and simple algorithm called Virtual Representations (CCVR), which adjusts the classifier using virtual representations sampled from an approximated ssian mixture model.
Experimental results demonstrate that CCVR state-of-the-art performance on popular federated learning benchmarks including CIFAR-10, CIFAR-100, and CINIC-10.
arXiv Detail & Related papers (2021-06-09T12:02:29Z) - Diverse Image Generation via Self-Conditioned GANs [56.91974064348137]
We train a class-conditional GAN model without using manually annotated class labels.
Instead, our model is conditional on labels automatically derived from clustering in the discriminator's feature space.
Our clustering step automatically discovers diverse modes, and explicitly requires the generator to cover them.
arXiv Detail & Related papers (2020-06-18T17:56:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.