Classifier-free Guidance with Adaptive Scaling
- URL: http://arxiv.org/abs/2502.10574v1
- Date: Fri, 14 Feb 2025 22:04:53 GMT
- Title: Classifier-free Guidance with Adaptive Scaling
- Authors: Dawid Malarz, Artur Kasymov, Maciej Zięba, Jacek Tabor, Przemysław Spurek,
- Abstract summary: Free guidance (CFG) is an essential mechanism in text-driven diffusion models.
In this paper, we present $beta$adaptive-CFG, which controls the impact of guidance during generation.
Our model obtained better FID scores, maintaining the text-to-image CLIP similarity scores at a level similar to that of the reference CFG.
- Score: 7.179513844921256
- License:
- Abstract: Classifier-free guidance (CFG) is an essential mechanism in contemporary text-driven diffusion models. In practice, in controlling the impact of guidance we can see the trade-off between the quality of the generated images and correspondence to the prompt. When we use strong guidance, generated images fit the conditioned text perfectly but at the cost of their quality. Dually, we can use small guidance to generate high-quality results, but the generated images do not suit our prompt. In this paper, we present $\beta$-CFG ($\beta$-adaptive scaling in Classifier-Free Guidance), which controls the impact of guidance during generation to solve the above trade-off. First, $\beta$-CFG stabilizes the effects of guiding by gradient-based adaptive normalization. Second, $\beta$-CFG uses the family of single-modal ($\beta$-distribution), time-dependent curves to dynamically adapt the trade-off between prompt matching and the quality of samples during the diffusion denoising process. Our model obtained better FID scores, maintaining the text-to-image CLIP similarity scores at a level similar to that of the reference CFG.
Related papers
- Gradient-Free Classifier Guidance for Diffusion Model Sampling [4.450496470631169]
Gradient-free Guidance (GFCG) method consistently improves class prediction accuracy.
For ImageNet 512$times$512, we achieve a record $FD_textDINOv2$ 23.09, while simultaneously attaining a higher classification Precision (94.3%) compared to ATG (90.2%)
arXiv Detail & Related papers (2024-11-23T00:22:21Z) - Enhancing Zero-Shot Vision Models by Label-Free Prompt Distribution Learning and Bias Correcting [55.361337202198925]
Vision-language models, such as CLIP, have shown impressive generalization capacities when using appropriate text descriptions.
We propose a label-Free prompt distribution learning and bias correction framework, dubbed as **Frolic**, which boosts zero-shot performance without the need for labeled data.
arXiv Detail & Related papers (2024-10-25T04:00:45Z) - Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models [27.640009920058187]
We revisit the CFG update rule and introduce modifications to address this issue.
We propose down-weighting the parallel component to achieve high-quality generations without oversaturation.
We also introduce a new rescaling momentum method for the CFG update rule based on this insight.
arXiv Detail & Related papers (2024-10-03T12:06:29Z) - Forgery-aware Adaptive Transformer for Generalizable Synthetic Image
Detection [106.39544368711427]
We study the problem of generalizable synthetic image detection, aiming to detect forgery images from diverse generative methods.
We present a novel forgery-aware adaptive transformer approach, namely FatFormer.
Our approach tuned on 4-class ProGAN data attains an average of 98% accuracy to unseen GANs, and surprisingly generalizes to unseen diffusion models with 95% accuracy.
arXiv Detail & Related papers (2023-12-27T17:36:32Z) - Adaptive Guidance: Training-free Acceleration of Conditional Diffusion
Models [44.58960475893552]
"Adaptive Guidance" (AG) is an efficient variant of computation-Free Guidance (CFG)
AG preserves CFG's image quality while reducing by 25%.
" LinearAG" offers even cheaper inference at the cost of deviating from the baseline model.
arXiv Detail & Related papers (2023-12-19T17:08:48Z) - Guided Flows for Generative Modeling and Decision Making [55.42634941614435]
We show that Guided Flows significantly improves the sample quality in conditional image generation and zero-shot text synthesis-to-speech.
Notably, we are first to apply flow models for plan generation in the offline reinforcement learning setting ax speedup in compared to diffusion models.
arXiv Detail & Related papers (2023-11-22T15:07:59Z) - Bridging the Gap: Addressing Discrepancies in Diffusion Model Training
for Classifier-Free Guidance [1.6804613362826175]
Diffusion models have emerged as a pivotal advancement in generative models.
In this paper we aim to underscore a discrepancy between conventional training methods and the desired conditional sampling behavior.
We introduce an updated loss function that better aligns training objectives with sampling behaviors.
arXiv Detail & Related papers (2023-11-02T02:03:12Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - AdaptiveClick: Clicks-aware Transformer with Adaptive Focal Loss for Interactive Image Segmentation [51.82915587228898]
We introduce AdaptiveClick -- a transformer-based, mask-adaptive segmentation framework for Interactive Image (IIS)
The key ingredient of our method is the Click-Aware Mask-adaptive transformer Decoder (CAMD), which enhances the interaction between click and image features.
With a plain ViT backbone, extensive experimental results on nine datasets demonstrate the superiority of AdaptiveClick compared to state-of-the-art methods.
arXiv Detail & Related papers (2023-05-07T13:47:35Z) - End-to-End Diffusion Latent Optimization Improves Classifier Guidance [81.27364542975235]
Direct Optimization of Diffusion Latents (DOODL) is a novel guidance method.
It enables plug-and-play guidance by optimizing diffusion latents.
It outperforms one-step classifier guidance on computational and human evaluation metrics.
arXiv Detail & Related papers (2023-03-23T22:43:52Z) - Graph Federated Learning for CIoT Devices in Smart Home Applications [23.216140264163535]
We propose a novel Graph Signal Processing (GSP)-inspired aggregation rule based on graph filtering dubbed G-Fedfilt''
The proposed aggregator enables a structured flow of information based on the graph's topology.
It is capable of yielding up to $2.41%$ higher accuracy than FedAvg in the case of testing the generalization of the models.
arXiv Detail & Related papers (2022-12-29T17:57:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.