TCFG: Tangential Damping Classifier-free Guidance
- URL: http://arxiv.org/abs/2503.18137v1
- Date: Sun, 23 Mar 2025 16:49:49 GMT
- Title: TCFG: Tangential Damping Classifier-free Guidance
- Authors: Mingi Kwon, Shin seong Kim, Jaeseok Jeong. Yi Ting Hsiao, Youngjung Uh,
- Abstract summary: Diffusion models have achieved remarkable success in text-to-image synthesis.<n>We introduce a novel approach that leverages a geometric perspective on the unconditional score to enhance CFG performance.
- Score: 11.151372758700619
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Diffusion models have achieved remarkable success in text-to-image synthesis, largely attributed to the use of classifier-free guidance (CFG), which enables high-quality, condition-aligned image generation. CFG combines the conditional score (e.g., text-conditioned) with the unconditional score to control the output. However, the unconditional score is in charge of estimating the transition between manifolds of adjacent timesteps from $x_t$ to $x_{t-1}$, which may inadvertently interfere with the trajectory toward the specific condition. In this work, we introduce a novel approach that leverages a geometric perspective on the unconditional score to enhance CFG performance when conditional scores are available. Specifically, we propose a method that filters the singular vectors of both conditional and unconditional scores using singular value decomposition. This filtering process aligns the unconditional score with the conditional score, thereby refining the sampling trajectory to stay closer to the manifold. Our approach improves image quality with negligible additional computation. We provide deeper insights into the score function behavior in diffusion models and present a practical technique for achieving more accurate and contextually coherent image synthesis.
Related papers
- Gradient-Free Classifier Guidance for Diffusion Model Sampling [4.450496470631169]
Gradient-free Guidance (GFCG) method consistently improves class prediction accuracy.
For ImageNet 512$times$512, we achieve a record $FD_textDINOv2$ 23.09, while simultaneously attaining a higher classification Precision (94.3%) compared to ATG (90.2%)
arXiv Detail & Related papers (2024-11-23T00:22:21Z) - Text-Anchored Score Composition: Tackling Condition Misalignment in Text-to-Image Diffusion Models [35.02969643344228]
We present a training free approach called Text-Anchored Score Composition (TASC) to improve controllability of existing models.
We propose an attention operation to realign these independently calculated results via a cross-attention mechanism to avoid new conflicts when combining them back.
arXiv Detail & Related papers (2023-06-26T03:48:15Z) - EGC: Image Generation and Classification via a Diffusion Energy-Based
Model [59.591755258395594]
This work introduces an energy-based classifier and generator, namely EGC, which can achieve superior performance in both tasks using a single neural network.
EGC achieves competitive generation results compared with state-of-the-art approaches on ImageNet-1k, CelebA-HQ and LSUN Church.
This work represents the first successful attempt to simultaneously excel in both tasks using a single set of network parameters.
arXiv Detail & Related papers (2023-04-04T17:59:14Z) - Contrastive Model Adaptation for Cross-Condition Robustness in Semantic
Segmentation [58.17907376475596]
We investigate normal-to-adverse condition model adaptation for semantic segmentation.
Our method -- CMA -- leverages such image pairs to learn condition-invariant features via contrastive learning.
We achieve state-of-the-art semantic segmentation performance for model adaptation on several normal-to-adverse adaptation benchmarks.
arXiv Detail & Related papers (2023-03-09T11:48:29Z) - Score-based Continuous-time Discrete Diffusion Models [102.65769839899315]
We extend diffusion models to discrete variables by introducing a Markov jump process where the reverse process denoises via a continuous-time Markov chain.
We show that an unbiased estimator can be obtained via simple matching the conditional marginal distributions.
We demonstrate the effectiveness of the proposed method on a set of synthetic and real-world music and image benchmarks.
arXiv Detail & Related papers (2022-11-30T05:33:29Z) - Optimized latent-code selection for explainable conditional
text-to-image GANs [8.26410341981427]
We present a variety of techniques to take a deep look into the latent space and semantic space of the conditional text-to-image GANs model.
We propose a framework for finding good latent codes by utilizing a linear SVM.
arXiv Detail & Related papers (2022-04-27T03:12:55Z) - Are conditional GANs explicitly conditional? [0.0]
This paper proposes two contributions for conditional Generative Adversarial Networks (cGANs)
The first main contribution is an analysis of cGANs to show that they are not explicitly conditional.
The second contribution is a new method, called acontrario, that explicitly models conditionality for both parts of the adversarial architecture.
arXiv Detail & Related papers (2021-06-28T22:49:27Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Gaussian MRF Covariance Modeling for Efficient Black-Box Adversarial
Attacks [86.88061841975482]
We study the problem of generating adversarial examples in a black-box setting, where we only have access to a zeroth order oracle.
We use this setting to find fast one-step adversarial attacks, akin to a black-box version of the Fast Gradient Sign Method(FGSM)
We show that the method uses fewer queries and achieves higher attack success rates than the current state of the art.
arXiv Detail & Related papers (2020-10-08T18:36:51Z) - Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence
Lip-Reading [96.48553941812366]
Lip-reading aims to infer the speech content from the lip movement sequence.
Traditional learning process of seq2seq models suffers from two problems.
We propose a novel pseudo-convolutional policy gradient (PCPG) based method to address these two problems.
arXiv Detail & Related papers (2020-03-09T09:12:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.