Visual Generation Without Guidance
- URL: http://arxiv.org/abs/2501.15420v1
- Date: Sun, 26 Jan 2025 06:48:05 GMT
- Title: Visual Generation Without Guidance
- Authors: Huayu Chen, Kai Jiang, Kaiwen Zheng, Jianfei Chen, Hang Su, Jun Zhu,
- Abstract summary: We propose to build visual models that are free from sampling guided.
The resulting algorithm, Guidance-Free Training (GFT), matches the performance of CFG while reducing sampling to a single model, halving the cost.
- Score: 28.029707495420475
- License:
- Abstract: Classifier-Free Guidance (CFG) has been a default technique in various visual generative models, yet it requires inference from both conditional and unconditional models during sampling. We propose to build visual models that are free from guided sampling. The resulting algorithm, Guidance-Free Training (GFT), matches the performance of CFG while reducing sampling to a single model, halving the computational cost. Unlike previous distillation-based approaches that rely on pretrained CFG networks, GFT enables training directly from scratch. GFT is simple to implement. It retains the same maximum likelihood objective as CFG and differs mainly in the parameterization of conditional models. Implementing GFT requires only minimal modifications to existing codebases, as most design choices and hyperparameters are directly inherited from CFG. Our extensive experiments across five distinct visual models demonstrate the effectiveness and versatility of GFT. Across domains of diffusion, autoregressive, and masked-prediction modeling, GFT consistently achieves comparable or even lower FID scores, with similar diversity-fidelity trade-offs compared with CFG baselines, all while being guidance-free. Code will be available at https://github.com/thu-ml/GFT.
Related papers
- Diffusion Models without Classifier-free Guidance [41.59396565229466]
Model-guidance (MG) is a novel objective for training diffusion model addresses and removes commonly used guidance (CFG)
Our innovative approach transcends the standard modeling and incorporates the posterior probability of conditions.
Our method significantly accelerates the training process, doubles inference speed, and achieve exceptional quality that parallel surpass even concurrent diffusion models with CFG.
arXiv Detail & Related papers (2025-02-17T18:59:50Z) - Nested Annealed Training Scheme for Generative Adversarial Networks [54.70743279423088]
This paper focuses on a rigorous mathematical theoretical framework: the composite-functional-gradient GAN (CFG)
We reveal the theoretical connection between the CFG model and score-based models.
We find that the training objective of the CFG discriminator is equivalent to finding an optimal D(x)
arXiv Detail & Related papers (2025-01-20T07:44:09Z) - Toward Guidance-Free AR Visual Generation via Condition Contrastive Alignment [31.402736873762418]
Motivated by language model alignment methods, we propose textitCondition Contrastive Alignment (CCA) to facilitate guidance-free AR visual generation with high performance.
Experimental results show that CCA can significantly enhance the guidance-free performance of all tested models with just one epoch fine-tuning.
This experimentally confirms the strong theoretical connection between language-targeted alignment and visual-targeted guidance methods.
arXiv Detail & Related papers (2024-10-12T03:31:25Z) - Diversity-Rewarded CFG Distillation [62.08448835625036]
We introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations.
Our approach optimises two training objectives: (1) a distillation objective, encouraging the model alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL objective with a diversity reward, promoting the generation of diverse outputs for a given prompt.
arXiv Detail & Related papers (2024-10-08T14:40:51Z) - Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models [27.640009920058187]
We revisit the CFG update rule and introduce modifications to address this issue.
We propose down-weighting the parallel component to achieve high-quality generations without oversaturation.
We also introduce a new rescaling momentum method for the CFG update rule based on this insight.
arXiv Detail & Related papers (2024-10-03T12:06:29Z) - CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models [52.29804282879437]
CFG++ is a novel approach that tackles the offmanifold challenges inherent to traditional CFG.
It offers better inversion-to-image generation, invertibility, smaller guidance scales, reduced mode collapse, etc.
It can be easily integrated into high-order diffusion solvers and naturally extends to distilled diffusion models.
arXiv Detail & Related papers (2024-06-12T10:40:10Z) - Test-Time Model Adaptation with Only Forward Passes [68.11784295706995]
Test-time adaptation has proven effective in adapting a given trained model to unseen test samples with potential distribution shifts.
We propose a test-time Forward-Optimization Adaptation (FOA) method.
FOA runs on quantized 8-bit ViT, outperforms gradient-based TENT on full-precision 32-bit ViT, and achieves an up to 24-fold memory reduction on ImageNet-C.
arXiv Detail & Related papers (2024-04-02T05:34:33Z) - Single-Stage Visual Relationship Learning using Conditional Queries [60.90880759475021]
TraCQ is a new formulation for scene graph generation that avoids the multi-task learning problem and the entity pair distribution.
We employ a DETR-based encoder-decoder conditional queries to significantly reduce the entity label space as well.
Experimental results show that TraCQ not only outperforms existing single-stage scene graph generation methods, it also beats many state-of-the-art two-stage methods on the Visual Genome dataset.
arXiv Detail & Related papers (2023-06-09T06:02:01Z) - Stochastic Generative Flow Networks [89.34644133901647]
Generative Flow Networks (or GFlowNets) learn to sample complex structures through the lens of "inference as control"
Existing GFlowNets can be applied only to deterministic environments, and fail in more general tasks with dynamics.
This paper introduces GFlowNets, a new algorithm that extends GFlowNets to environments.
arXiv Detail & Related papers (2023-02-19T03:19:40Z) - Graph Federated Learning for CIoT Devices in Smart Home Applications [23.216140264163535]
We propose a novel Graph Signal Processing (GSP)-inspired aggregation rule based on graph filtering dubbed G-Fedfilt''
The proposed aggregator enables a structured flow of information based on the graph's topology.
It is capable of yielding up to $2.41%$ higher accuracy than FedAvg in the case of testing the generalization of the models.
arXiv Detail & Related papers (2022-12-29T17:57:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.