Nested Annealed Training Scheme for Generative Adversarial Networks
- URL: http://arxiv.org/abs/2501.11318v1
- Date: Mon, 20 Jan 2025 07:44:09 GMT
- Title: Nested Annealed Training Scheme for Generative Adversarial Networks
- Authors: Chang Wan, Ming-Hsuan Yang, Minglu Li, Yunliang Jiang, Zhonglong Zheng,
- Abstract summary: This paper focuses on a rigorous mathematical theoretical framework: the composite-functional-gradient GAN (CFG)
We reveal the theoretical connection between the CFG model and score-based models.
We find that the training objective of the CFG discriminator is equivalent to finding an optimal D(x)
- Score: 54.70743279423088
- License:
- Abstract: Recently, researchers have proposed many deep generative models, including generative adversarial networks(GANs) and denoising diffusion models. Although significant breakthroughs have been made and empirical success has been achieved with the GAN, its mathematical underpinnings remain relatively unknown. This paper focuses on a rigorous mathematical theoretical framework: the composite-functional-gradient GAN (CFG)[1]. Specifically, we reveal the theoretical connection between the CFG model and score-based models. We find that the training objective of the CFG discriminator is equivalent to finding an optimal D(x). The optimal gradient of D(x) differentiates the integral of the differences between the score functions of real and synthesized samples. Conversely, training the CFG generator involves finding an optimal G(x) that minimizes this difference. In this paper, we aim to derive an annealed weight preceding the weight of the CFG discriminator. This new explicit theoretical explanation model is called the annealed CFG method. To overcome the limitation of the annealed CFG method, as the method is not readily applicable to the SOTA GAN model, we propose a nested annealed training scheme (NATS). This scheme keeps the annealed weight from the CFG method and can be seamlessly adapted to various GAN models, no matter their structural, loss, or regularization differences. We conduct thorough experimental evaluations on various benchmark datasets for image generation. The results show that our annealed CFG and NATS methods significantly improve the quality and diversity of the synthesized samples. This improvement is clear when comparing the CFG method and the SOTA GAN models.
Related papers
- Diffusion Models without Classifier-free Guidance [41.59396565229466]
Model-guidance (MG) is a novel objective for training diffusion model addresses and removes commonly used guidance (CFG)
Our innovative approach transcends the standard modeling and incorporates the posterior probability of conditions.
Our method significantly accelerates the training process, doubles inference speed, and achieve exceptional quality that parallel surpass even concurrent diffusion models with CFG.
arXiv Detail & Related papers (2025-02-17T18:59:50Z) - Understanding Classifier-Free Guidance: High-Dimensional Theory and Non-Linear Generalizations [22.44946627454133]
We show thatconditional-Free Guidance effectively reproduces the target distribution in infinite and sufficiently high-dimensional contexts.
We also explore finite-dimensional effects, precisely characterizing overshoot and variance reduction.
arXiv Detail & Related papers (2025-02-11T10:29:29Z) - On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks [56.78271181959529]
Kolmogorov--Arnold Networks (KANs) have gained significant attention in the deep learning community.
Empirical investigations demonstrate that KANs optimized via gradient descent (SGD) are capable of achieving near-zero training loss.
arXiv Detail & Related papers (2024-10-10T15:34:10Z) - Diversity-Rewarded CFG Distillation [62.08448835625036]
We introduce diversity-rewarded CFG distillation, a novel finetuning procedure that distills the strengths of CFG while addressing its limitations.
Our approach optimises two training objectives: (1) a distillation objective, encouraging the model alone (without CFG) to imitate the CFG-augmented predictions, and (2) an RL objective with a diversity reward, promoting the generation of diverse outputs for a given prompt.
arXiv Detail & Related papers (2024-10-08T14:40:51Z) - Adaptive Guidance: Training-free Acceleration of Conditional Diffusion
Models [44.58960475893552]
"Adaptive Guidance" (AG) is an efficient variant of computation-Free Guidance (CFG)
AG preserves CFG's image quality while reducing by 25%.
" LinearAG" offers even cheaper inference at the cost of deviating from the baseline model.
arXiv Detail & Related papers (2023-12-19T17:08:48Z) - Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution [67.9215891673174]
We propose score entropy as a novel loss that naturally extends score matching to discrete spaces.
We test our Score Entropy Discrete Diffusion models on standard language modeling tasks.
arXiv Detail & Related papers (2023-10-25T17:59:12Z) - Toward Discretization-Consistent Closure Schemes for Large Eddy
Simulation Using Reinforcement Learning [0.0]
This study proposes a novel method for developing discretization-consistent closure schemes for Large Eddy Simulation (LES)
The task of adapting the coefficients of LES closure models is framed as a Markov decision process and solved in an a posteriori manner with Reinforcement Learning (RL)
All newly derived models achieve accurate results that either match or outperform traditional models for different discretizations and resolutions.
arXiv Detail & Related papers (2023-09-12T14:20:12Z) - An Evaluation Study of Generative Adversarial Networks for Collaborative
Filtering [75.83628561622287]
This work successfully replicates the results published in the original paper and discusses the impact of certain differences between the CFGAN framework and the model used in the original evaluation.
The work further expands the experimental analysis comparing CFGAN against a selection of simple and well-known properly optimized baselines, observing that CFGAN is not consistently competitive against them despite its high computational cost.
arXiv Detail & Related papers (2022-01-05T20:53:27Z) - Understanding Overparameterization in Generative Adversarial Networks [56.57403335510056]
Generative Adversarial Networks (GANs) are used to train non- concave mini-max optimization problems.
A theory has shown the importance of the gradient descent (GD) to globally optimal solutions.
We show that in an overized GAN with a $1$-layer neural network generator and a linear discriminator, the GDA converges to a global saddle point of the underlying non- concave min-max problem.
arXiv Detail & Related papers (2021-04-12T16:23:37Z) - A Distributed Optimisation Framework Combining Natural Gradient with
Hessian-Free for Discriminative Sequence Training [16.83036203524611]
This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training.
It relies on the linear conjugate gradient (CG) algorithm to combine the natural gradient (NG) method with local curvature information from Hessian-free (HF) or other second-order methods.
Experiments are reported on the multi-genre broadcast data set for a range of different acoustic model types.
arXiv Detail & Related papers (2021-03-12T22:18:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.