Stable Consistency Tuning: Understanding and Improving Consistency Models
- URL: http://arxiv.org/abs/2410.18958v3
- Date: Wed, 04 Dec 2024 05:04:42 GMT
- Title: Stable Consistency Tuning: Understanding and Improving Consistency Models
- Authors: Fu-Yun Wang, Zhengyang Geng, Hongsheng Li,
- Abstract summary: Diffusion models achieve superior generation quality but suffer from slow generation speed due to iterative nature of denoising.
consistency models, a new generative family, achieve competitive performance with significantly faster sampling.
We propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference(TD) Learning.
- Score: 40.2712218203989
- License:
- Abstract: Diffusion models achieve superior generation quality but suffer from slow generation speed due to the iterative nature of denoising. In contrast, consistency models, a new generative family, achieve competitive performance with significantly faster sampling. These models are trained either through consistency distillation, which leverages pretrained diffusion models, or consistency training/tuning directly from raw data. In this work, we propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference~(TD) Learning. More importantly, this framework allows us to analyze the limitations of current consistency training/tuning strategies. Built upon Easy Consistency Tuning (ECT), we propose Stable Consistency Tuning (SCT), which incorporates variance-reduced learning using the score identity. SCT leads to significant performance improvements on benchmarks such as CIFAR-10 and ImageNet-64. On ImageNet-64, SCT achieves 1-step FID 2.42 and 2-step FID 1.55, a new SoTA for consistency models.
Related papers
- Efficiency Meets Fidelity: A Novel Quantization Framework for Stable Diffusion [9.8078769718432]
We propose an efficient quantization framework for Stable Diffusion models.
Our approach features a Serial-to-Parallel calibration pipeline that addresses the consistency of both the calibration and inference processes.
Under W4A8 quantization settings, our approach enhances both distribution similarity and visual similarity by 45%-60%.
arXiv Detail & Related papers (2024-12-09T17:00:20Z) - See Further When Clear: Curriculum Consistency Model [20.604239652914355]
We propose the Curriculum Consistency Model ( CCM), which stabilizes and balances the learning complexity across timesteps.
Specifically, we regard the distillation process at each timestep as a curriculum and introduce a metric based on Peak Signal-to-Noise Ratio (PSNR) to quantify the learning complexity.
Our method achieves competitive single-step sampling Fr't Inception Distance (FID) scores of 1.64 on CIFAR-10 and 2.18 on ImageNet 64x64.
arXiv Detail & Related papers (2024-12-09T08:39:01Z) - Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.
We empirically find that this training paradigm limits the one-step generation performance of consistency models.
We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z) - Decouple-Then-Merge: Towards Better Training for Diffusion Models [45.89372687373466]
Diffusion models are trained by learning a sequence of models that reverse each step of noise corruption.
This work proposes a Decouple-then-Merge (DeMe) framework, which begins with a pretrained model and finetunes separate models tailored to specific timesteps.
arXiv Detail & Related papers (2024-10-09T08:19:25Z) - Provable Statistical Rates for Consistency Diffusion Models [87.28777947976573]
Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved.
This paper contributes towards the first statistical theory for consistency models, formulating their training as a distribution discrepancy minimization problem.
arXiv Detail & Related papers (2024-06-23T20:34:18Z) - Fixed Point Diffusion Models [13.035518953879539]
Fixed Point Diffusion Model (FPDM) is a novel approach to image generation that integrates the concept of fixed point solving into the framework of diffusion-based generative modeling.
Our approach embeds an implicit fixed point solving layer into the denoising network of a diffusion model, transforming the diffusion process into a sequence of closely-related fixed point problems.
We conduct experiments with state-of-the-art models on ImageNet, FFHQ, CelebA-HQ, and LSUN-Church, demonstrating substantial improvements in performance and efficiency.
arXiv Detail & Related papers (2024-01-16T18:55:54Z) - Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models [52.1809084559048]
We propose a novel two-stage divide-and-conquer training strategy termed TDC Training.
It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models.
While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model.
arXiv Detail & Related papers (2023-12-20T03:32:58Z) - Improved Techniques for Training Consistency Models [13.475711217989975]
We present improved techniques for consistency training, where consistency models learn directly from data without distillation.
We propose a lognormal noise schedule for the consistency training objective, and propose to double total discretization steps every set number of training iterations.
These modifications enable consistency models to achieve FID scores of 2.51 and 3.25 on CIFAR-10 and ImageNet $64times 64$ respectively in a single sampling step.
arXiv Detail & Related papers (2023-10-22T05:33:38Z) - Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion [56.38386580040991]
Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM)
CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance.
Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
arXiv Detail & Related papers (2023-10-01T05:07:17Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.