Improved Training Technique for Latent Consistency Models
- URL: http://arxiv.org/abs/2502.01441v1
- Date: Mon, 03 Feb 2025 15:25:58 GMT
- Title: Improved Training Technique for Latent Consistency Models
- Authors: Quan Dao, Khanh Doan, Di Liu, Trung Le, Dimitris Metaxas,
- Abstract summary: Consistency models are capable of producing high-quality samples in either a single step or multiple steps.
We analyze the statistical differences between pixel and latent spaces, discovering that latent data often contains highly impulsive outliers.
We introduce a diffusion loss at early timesteps and employ optimal transport (OT) coupling to further enhance performance.
- Score: 18.617862678160243
- License:
- Abstract: Consistency models are a new family of generative models capable of producing high-quality samples in either a single step or multiple steps. Recently, consistency models have demonstrated impressive performance, achieving results on par with diffusion models in the pixel space. However, the success of scaling consistency training to large-scale datasets, particularly for text-to-image and video generation tasks, is determined by performance in the latent space. In this work, we analyze the statistical differences between pixel and latent spaces, discovering that latent data often contains highly impulsive outliers, which significantly degrade the performance of iCT in the latent space. To address this, we replace Pseudo-Huber losses with Cauchy losses, effectively mitigating the impact of outliers. Additionally, we introduce a diffusion loss at early timesteps and employ optimal transport (OT) coupling to further enhance performance. Lastly, we introduce the adaptive scaling-$c$ scheduler to manage the robust training process and adopt Non-scaling LayerNorm in the architecture to better capture the statistics of the features and reduce outlier impact. With these strategies, we successfully train latent consistency models capable of high-quality sampling with one or two steps, significantly narrowing the performance gap between latent consistency and diffusion models. The implementation is released here: https://github.com/quandao10/sLCT/
Related papers
- Scalable Model Merging with Progressive Layer-wise Distillation [17.521794641817642]
We introduce a novel few-shot merging algorithm, ProDistill (Progressive Layer-wise Distillation)
We show that ProDistill achieves state-of-the-art performance, with up to 6.14% and 6.61% improvements in vision and NLU tasks.
arXiv Detail & Related papers (2025-02-18T10:15:18Z) - Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training [4.760537994346813]
As data distributions grow more complex, training diffusion models to convergence becomes increasingly intensive.
We introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps.
Our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures.
arXiv Detail & Related papers (2024-11-15T07:12:18Z) - Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.
We empirically find that this training paradigm limits the one-step generation performance of consistency models.
We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z) - Mitigating Embedding Collapse in Diffusion Models for Categorical Data [52.90687881724333]
We introduce CATDM, a continuous diffusion framework within the embedding space that stabilizes training.
Experiments on benchmarks show that CATDM mitigates embedding collapse, yielding superior results on FFHQ, LSUN Churches, and LSUN Bedrooms.
arXiv Detail & Related papers (2024-10-18T09:12:33Z) - Inverse design with conditional cascaded diffusion models [0.0]
Adjoint-based design optimizations are usually computationally expensive and those costs scale with resolution.
We extend the use of diffusion models over traditional generative models by proposing the conditional cascaded diffusion model (cCDM)
Our study compares cCDM against a cGAN model with transfer learning.
While both models show decreased performance with reduced high-resolution training data, the cCDM loses its superiority to the cGAN model with transfer learning when training data is limited.
arXiv Detail & Related papers (2024-08-16T04:54:09Z) - Provable Statistical Rates for Consistency Diffusion Models [87.28777947976573]
Despite the state-of-the-art performance, diffusion models are known for their slow sample generation due to the extensive number of steps involved.
This paper contributes towards the first statistical theory for consistency models, formulating their training as a distribution discrepancy minimization problem.
arXiv Detail & Related papers (2024-06-23T20:34:18Z) - On Improving the Algorithm-, Model-, and Data- Efficiency of Self-Supervised Learning [18.318758111829386]
We propose an efficient single-branch SSL method based on non-parametric instance discrimination.
We also propose a novel self-distillation loss that minimizes the KL divergence between the probability distribution and its square root version.
arXiv Detail & Related papers (2024-04-30T06:39:04Z) - Robust Learning with Progressive Data Expansion Against Spurious
Correlation [65.83104529677234]
We study the learning process of a two-layer nonlinear convolutional neural network in the presence of spurious features.
Our analysis suggests that imbalanced data groups and easily learnable spurious features can lead to the dominance of spurious features during the learning process.
We propose a new training algorithm called PDE that efficiently enhances the model's robustness for a better worst-group performance.
arXiv Detail & Related papers (2023-06-08T05:44:06Z) - Improving Adversarial Robustness by Contrastive Guided Diffusion Process [19.972628281993487]
We propose Contrastive-Guided Diffusion Process (Contrastive-DP) to guide the diffusion model in data generation.
We show that enhancing the distinguishability among the generated data is critical for improving adversarial robustness.
arXiv Detail & Related papers (2022-10-18T07:20:53Z) - Revisiting Consistency Regularization for Semi-Supervised Learning [80.28461584135967]
We propose an improved consistency regularization framework by a simple yet effective technique, FeatDistLoss.
Experimental results show that our model defines a new state of the art for various datasets and settings.
arXiv Detail & Related papers (2021-12-10T20:46:13Z) - Extrapolation for Large-batch Training in Deep Learning [72.61259487233214]
We show that a host of variations can be covered in a unified framework that we propose.
We prove the convergence of this novel scheme and rigorously evaluate its empirical performance on ResNet, LSTM, and Transformer.
arXiv Detail & Related papers (2020-06-10T08:22:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.