Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion
- URL: http://arxiv.org/abs/2310.02279v3
- Date: Sat, 30 Mar 2024 06:29:48 GMT
- Title: Consistency Trajectory Models: Learning Probability Flow ODE Trajectory of Diffusion
- Authors: Dongjun Kim, Chieh-Hsin Lai, Wei-Hsiang Liao, Naoki Murata, Yuhta Takida, Toshimitsu Uesaka, Yutong He, Yuki Mitsufuji, Stefano Ermon,
- Abstract summary: Consistency Trajectory Model (CTM) is a generalization of Consistency Models (CM)
CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance.
Unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods.
- Score: 56.38386580040991
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Consistency Models (CM) (Song et al., 2023) accelerate score-based diffusion model sampling at the cost of sample quality but lack a natural way to trade-off quality for speed. To address this limitation, we propose Consistency Trajectory Model (CTM), a generalization encompassing CM and score-based models as special cases. CTM trains a single neural network that can -- in a single forward pass -- output scores (i.e., gradients of log-density) and enables unrestricted traversal between any initial and final time along the Probability Flow Ordinary Differential Equation (ODE) in a diffusion process. CTM enables the efficient combination of adversarial training and denoising score matching loss to enhance performance and achieves new state-of-the-art FIDs for single-step diffusion model sampling on CIFAR-10 (FID 1.73) and ImageNet at 64x64 resolution (FID 1.92). CTM also enables a new family of sampling schemes, both deterministic and stochastic, involving long jumps along the ODE solution trajectories. It consistently improves sample quality as computational budgets increase, avoiding the degradation seen in CM. Furthermore, unlike CM, CTM's access to the score function can streamline the adoption of established controllable/conditional generation methods from the diffusion community. This access also enables the computation of likelihood. The code is available at https://github.com/sony/ctm.
Related papers
- Stable Consistency Tuning: Understanding and Improving Consistency Models [40.2712218203989]
Diffusion models achieve superior generation quality but suffer from slow generation speed due to iterative nature of denoising.
consistency models, a new generative family, achieve competitive performance with significantly faster sampling.
We propose a novel framework for understanding consistency models by modeling the denoising process of the diffusion model as a Markov Decision Process (MDP) and framing consistency model training as the value estimation through Temporal Difference(TD) Learning.
arXiv Detail & Related papers (2024-10-24T17:55:52Z) - Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference [60.32804641276217]
We propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs.
A high-quality 768 x 768 24-step LCM takes only 32 A100 GPU hours for training.
We also introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets.
arXiv Detail & Related papers (2023-10-06T17:11:58Z) - Continuous time recurrent neural networks: overview and application to
forecasting blood glucose in the intensive care unit [56.801856519460465]
Continuous time autoregressive recurrent neural networks (CTRNNs) are a deep learning model that account for irregular observations.
We demonstrate the application of these models to probabilistic forecasting of blood glucose in a critical care setting.
arXiv Detail & Related papers (2023-04-14T09:39:06Z) - Score-based diffusion models for accelerated MRI [35.3148116010546]
We introduce a way to sample data from a conditional distribution given the measurements, such that the model can be readily used for solving inverse problems in imaging.
Our model requires magnitude images only for training, and yet is able to reconstruct complex-valued data, and even extends to parallel imaging.
arXiv Detail & Related papers (2021-10-08T08:42:03Z) - A Distributed Optimisation Framework Combining Natural Gradient with
Hessian-Free for Discriminative Sequence Training [16.83036203524611]
This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training.
It relies on the linear conjugate gradient (CG) algorithm to combine the natural gradient (NG) method with local curvature information from Hessian-free (HF) or other second-order methods.
Experiments are reported on the multi-genre broadcast data set for a range of different acoustic model types.
arXiv Detail & Related papers (2021-03-12T22:18:34Z) - Autoregressive Score Matching [113.4502004812927]
We propose autoregressive conditional score models (AR-CSM) where we parameterize the joint distribution in terms of the derivatives of univariable log-conditionals (scores)
For AR-CSM models, this divergence between data and model distributions can be computed and optimized efficiently, requiring no expensive sampling or adversarial training.
We show with extensive experimental results that it can be applied to density estimation on synthetic data, image generation, image denoising, and training latent variable models with implicit encoders.
arXiv Detail & Related papers (2020-10-24T07:01:24Z) - Denoising Diffusion Probabilistic Models [91.94962645056896]
We present high quality image synthesis results using diffusion probabilistic models.
Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics.
arXiv Detail & Related papers (2020-06-19T17:24:44Z) - A Generative Learning Approach for Spatio-temporal Modeling in Connected
Vehicular Network [55.852401381113786]
This paper proposes LaMI (Latency Model Inpainting), a novel framework to generate a comprehensive-temporal quality framework for wireless access latency of connected vehicles.
LaMI adopts the idea from image inpainting and synthesizing and can reconstruct the missing latency samples by a two-step procedure.
In particular, it first discovers the spatial correlation between samples collected in various regions using a patching-based approach and then feeds the original and highly correlated samples into a Varienational Autocoder (VAE)
arXiv Detail & Related papers (2020-03-16T03:43:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.