Beyond and Free from Diffusion: Invertible Guided Consistency Training
- URL: http://arxiv.org/abs/2502.05391v1
- Date: Sat, 08 Feb 2025 00:23:40 GMT
- Title: Beyond and Free from Diffusion: Invertible Guided Consistency Training
- Authors: Chia-Hong Hsu, Shiu-hong Kao, Randall Balestriero,
- Abstract summary: iGCT contributes to fast and guided image generation and editing without requiring the training and distillation of DMs.<n>Our experiments on CIFAR-10 and ImageNet64 show that iGCT significantly improves FID and precision compared to CFG.
- Score: 12.277762115388187
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Guidance in image generation steers models towards higher-quality or more targeted outputs, typically achieved in Diffusion Models (DMs) via Classifier-free Guidance (CFG). However, recent Consistency Models (CMs), which offer fewer function evaluations, rely on distilling CFG knowledge from pretrained DMs to achieve guidance, making them costly and inflexible. In this work, we propose invertible Guided Consistency Training (iGCT), a novel training framework for guided CMs that is entirely data-driven. iGCT, as a pioneering work, contributes to fast and guided image generation and editing without requiring the training and distillation of DMs, greatly reducing the overall compute requirements. iGCT addresses the saturation artifacts seen in CFG under high guidance scales. Our extensive experiments on CIFAR-10 and ImageNet64 show that iGCT significantly improves FID and precision compared to CFG. At a guidance of 13, iGCT improves precision to 0.8, while DM's drops to 0.47. Our work takes the first step toward enabling guidance and inversion for CMs without relying on DMs.
Related papers
- Token Perturbation Guidance for Diffusion Models [1.511194037740325]
Token Perturbation Guidance (TPG) is a novel method that applies matrices directly to intermediate token representations within the diffusion network.<n>TPG is training-free and agnostic to input conditions, making it readily applicable to both conditional and unconditional generation.
arXiv Detail & Related papers (2025-06-10T21:25:46Z) - Normalized Attention Guidance: Universal Negative Guidance for Diffusion Models [57.20761595019967]
We present Normalized Attention Guidance (NAG), an efficient, training-free mechanism that applies extrapolation in attention space with L1-based normalization and refinement.<n>NAG restores effective negative guidance where CFG collapses while maintaining fidelity.<n>NAG generalizes across architectures (UNet, DiT), sampling regimes (few-step, multi-step), and modalities (image, video)
arXiv Detail & Related papers (2025-05-27T13:30:46Z) - Gradient-Free Classifier Guidance for Diffusion Model Sampling [4.450496470631169]
Gradient-free Guidance (GFCG) method consistently improves class prediction accuracy.
For ImageNet 512$times$512, we achieve a record $FD_textDINOv2$ 23.09, while simultaneously attaining a higher classification Precision (94.3%) compared to ATG (90.2%)
arXiv Detail & Related papers (2024-11-23T00:22:21Z) - Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models [27.640009920058187]
We revisit the CFG update rule and introduce modifications to address this issue.
We propose down-weighting the parallel component to achieve high-quality generations without oversaturation.
We also introduce a new rescaling momentum method for the CFG update rule based on this insight.
arXiv Detail & Related papers (2024-10-03T12:06:29Z) - Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models [33.09663675904689]
We investigate efficient diffusion training from the perspective of dataset pruning.
Inspired by the principles of data-efficient training for generative models such as generative adversarial networks (GANs), we first extend the data selection scheme used in GANs to DM training.
To further improve the generation performance, we employ a class-wise reweighting approach.
arXiv Detail & Related papers (2024-09-27T20:21:19Z) - Model Inversion Attacks Through Target-Specific Conditional Diffusion Models [54.69008212790426]
Model inversion attacks (MIAs) aim to reconstruct private images from a target classifier's training set, thereby raising privacy concerns in AI applications.
Previous GAN-based MIAs tend to suffer from inferior generative fidelity due to GAN's inherent flaws and biased optimization within latent space.
We propose Diffusion-based Model Inversion (Diff-MI) attacks to alleviate these issues.
arXiv Detail & Related papers (2024-07-16T06:38:49Z) - TLCM: Training-efficient Latent Consistency Model for Image Generation with 2-8 Steps [12.395969703425648]
Distilling latent diffusion models (LDMs) into ones that are fast to sample from is attracting growing research interest.
This paper proposes a novel training-efficient Latent Consistency Model (TLCM) to overcome these challenges.
With just 70 training hours on an A100 GPU, a 3-step TLCM distilled from SDXL achieves an impressive CLIP Score of 33.68 and an Aesthetic Score of 5.97 on the MSCOCO-2017 5K benchmark.
arXiv Detail & Related papers (2024-06-09T12:55:50Z) - Guided Score identity Distillation for Data-Free One-Step Text-to-Image Generation [62.30570286073223]
Diffusion-based text-to-image generation models have demonstrated the ability to produce images aligned with textual descriptions.<n>We introduce a data-free guided distillation method that enables the efficient distillation of pretrained Diffusion models without access to the real training data.<n>By exclusively training with synthetic images generated by its one-step generator, our data-free distillation method rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score.
arXiv Detail & Related papers (2024-06-03T17:44:11Z) - Slight Corruption in Pre-training Data Makes Better Diffusion Models [71.90034201302397]
Diffusion models (DMs) have shown remarkable capabilities in generating high-quality images, audios, and videos.
DMs benefit significantly from extensive pre-training on large-scale datasets.
However, pre-training datasets often contain corrupted pairs where conditions do not accurately describe the data.
This paper presents the first comprehensive study on the impact of such corruption in pre-training data of DMs.
arXiv Detail & Related papers (2024-05-30T21:35:48Z) - Linear Combination of Saved Checkpoints Makes Consistency and Diffusion Models Better [31.67038902035949]
Diffusion Models (DM) and Consistency Models (CM) are two types of popular generative models with good generation quality on various tasks.
In this work, we find that high-quality model weights often lie in a basin which cannot be reached by SGD but can be obtained by proper checkpoint averaging.
We propose LCSC, a simple but effective and efficient method to enhance the performance of DM and CM, by combining checkpoints along the training trajectory with coefficients deduced from evolutionary search.
arXiv Detail & Related papers (2024-04-02T18:59:39Z) - Pre-trained Model Guided Fine-Tuning for Zero-Shot Adversarial Robustness [52.9493817508055]
We propose Pre-trained Model Guided Adversarial Fine-Tuning (PMG-AFT) to enhance the model's zero-shot adversarial robustness.
Our approach consistently improves clean accuracy by an average of 8.72%.
arXiv Detail & Related papers (2024-01-09T04:33:03Z) - Consistency Regularization for Generalizable Source-free Domain
Adaptation [62.654883736925456]
Source-free domain adaptation (SFDA) aims to adapt a well-trained source model to an unlabelled target domain without accessing the source dataset.
Existing SFDA methods ONLY assess their adapted models on the target training set, neglecting the data from unseen but identically distributed testing sets.
We propose a consistency regularization framework to develop a more generalizable SFDA method.
arXiv Detail & Related papers (2023-08-03T07:45:53Z) - LGD: Label-guided Self-distillation for Object Detection [59.9972914042281]
We propose the first self-distillation framework for general object detection, termed LGD (Label-Guided self-Distillation)
Our framework involves sparse label-appearance encoding, inter-object relation adaptation and intra-object knowledge mapping to obtain the instructive knowledge.
Compared with a classical teacher-based method FGFI, LGD not only performs better without requiring pretrained teacher but also with 51% lower training cost beyond inherent student learning.
arXiv Detail & Related papers (2021-09-23T16:55:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.