Phased Consistency Model
- URL: http://arxiv.org/abs/2405.18407v1
- Date: Tue, 28 May 2024 17:47:19 GMT
- Title: Phased Consistency Model
- Authors: Fu-Yun Wang, Zhaoyang Huang, Alexander William Bergman, Dazhong Shen, Peng Gao, Michael Lingelbach, Keqiang Sun, Weikang Bian, Guanglu Song, Yu Liu, Hongsheng Li, Xiaogang Wang,
- Abstract summary: The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models.
However, its application to high-resolution, text-conditioned image generation in the latent space (a.k.a., LCM) remains unsatisfactory.
We propose the Phased Consistency Model (PCM), which generalizes the design space and addresses all identified limitations.
- Score: 80.31766777570058
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The consistency model (CM) has recently made significant progress in accelerating the generation of diffusion models. However, its application to high-resolution, text-conditioned image generation in the latent space (a.k.a., LCM) remains unsatisfactory. In this paper, we identify three key flaws in the current design of LCM. We investigate the reasons behind these limitations and propose the Phased Consistency Model (PCM), which generalizes the design space and addresses all identified limitations. Our evaluations demonstrate that PCM significantly outperforms LCM across 1--16 step generation settings. While PCM is specifically designed for multi-step refinement, it achieves even superior or comparable 1-step generation results to previously state-of-the-art specifically designed 1-step methods. Furthermore, we show that PCM's methodology is versatile and applicable to video generation, enabling us to train the state-of-the-art few-step text-to-video generator. More details are available at https://g-u-n.github.io/projects/pcm/.
Related papers
- Merging Models on the Fly Without Retraining: A Sequential Approach to Scalable Continual Model Merging [75.93960998357812]
Deep model merging represents an emerging research direction that combines multiple fine-tuned models to harness their capabilities across different tasks and domains.
Current model merging techniques focus on merging all available models simultaneously, with weight matrices-based methods being the predominant approaches.
We propose a training-free projection-based continual merging method that processes models sequentially.
arXiv Detail & Related papers (2025-01-16T13:17:24Z) - Flow Generator Matching [35.371071097381346]
Flow Generator Matching (FGM) is designed to accelerate the sampling of flow-matching models into a one-step generation.
On the CIFAR10 unconditional generation benchmark, our one-step FGM model achieves a new record Fr'echet Inception Distance (FID) score of 3.08.
MM-DiT-FGM one-step text-to-image model demonstrates outstanding industry-level performance.
arXiv Detail & Related papers (2024-10-25T05:41:28Z) - Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective [52.778766190479374]
Latent-based image generative models have achieved notable success in image generation tasks.
Despite sharing the same latent space, autoregressive models significantly lag behind LDMs and MIMs in image generation.
We propose a simple but effective discrete image tokenizer to stabilize the latent space for image generative modeling.
arXiv Detail & Related papers (2024-10-16T12:13:17Z) - Language Models as Zero-shot Lossless Gradient Compressors: Towards
General Neural Parameter Prior Models [66.1595537904019]
Large language models (LLMs) can act as gradient priors in a zero-shot setting.
We introduce LM-GC, a novel method that integrates LLMs with arithmetic coding.
arXiv Detail & Related papers (2024-09-26T13:38:33Z) - DreamLCM: Towards High-Quality Text-to-3D Generation via Latent Consistency Model [77.84225358245487]
We propose DreamLCM which incorporates the Latent Consistency Model (LCM) to generate consistent and high-quality guidance.
The proposed method can provide accurate and detailed gradients to optimize the target 3D models.
DreamLCM achieves state-of-the-art results in both generation quality and training efficiency.
arXiv Detail & Related papers (2024-08-06T06:59:15Z) - CCDM: Continuous Conditional Diffusion Models for Image Generation [22.70942688582302]
Conditional Diffusion Models (CDMs) offer a promising alternative to Continuous Conditional Generative Modeling (CCGM)
CDMs address existing limitations with specially designed conditional diffusion processes, a novel hard vicinal image denoising loss, and efficient conditional sampling procedures.
We demonstrate that CCDMs outperform state-of-the-art CCGM models, establishing a new benchmark.
arXiv Detail & Related papers (2024-05-06T15:10:19Z) - EdgeFusion: On-Device Text-to-Image Generation [3.3345550849564836]
We develop a compact SD variant, BK-SDM, for text-to-image generation.
We achieve rapid generation of photo-realistic, text-aligned images in just two steps, with latency under one second on resource-limited edge devices.
arXiv Detail & Related papers (2024-04-18T06:02:54Z) - LCM-LoRA: A Universal Stable-Diffusion Acceleration Module [52.8517132452467]
Latent Consistency Models (LCMs) have achieved impressive performance in accelerating text-to-image generative tasks.
This report further extends LCMs' potential by applying LoRA distillation to larger Stable-Diffusion models.
We identify the LoRA parameters obtained through LCM distillation as a universal Stable-Diffusion acceleration module, named LCM-LoRA.
arXiv Detail & Related papers (2023-11-09T18:04:15Z) - Latent Consistency Models: Synthesizing High-Resolution Images with
Few-Step Inference [60.32804641276217]
We propose Latent Consistency Models (LCMs), enabling swift inference with minimal steps on any pre-trained LDMs.
A high-quality 768 x 768 24-step LCM takes only 32 A100 GPU hours for training.
We also introduce Latent Consistency Fine-tuning (LCF), a novel method that is tailored for fine-tuning LCMs on customized image datasets.
arXiv Detail & Related papers (2023-10-06T17:11:58Z) - M3ICRO: Machine Learning-Enabled Compact Photonic Tensor Core based on
PRogrammable Multi-Operand Multimode Interference [18.0155410476884]
Photonic tensor core (PTC) designs based on standard optical components hinder scalability and compute density due to their large spatial footprint.
We propose an ultra-compact PTC using customized programmable multi-operand multimode interference (MOMMI) devices, named M3ICRO.
M3ICRO achieves a 3.4-9.6x smaller footprint, 1.6-4.4x higher speed, 10.6-42x higher compute density, 3.7-12x higher system throughput, and superior noise robustness.
arXiv Detail & Related papers (2023-05-31T02:34:36Z) - A Unified View of Long-Sequence Models towards Modeling Million-Scale
Dependencies [0.0]
We compare existing solutions to long-sequence modeling in terms of their pure mathematical formulation.
We then demonstrate that long context length does yield better performance, albeit application-dependent.
Inspired by emerging sparse models of huge capacity, we propose a machine learning system for handling million-scale dependencies.
arXiv Detail & Related papers (2023-02-13T09:47:31Z) - Neighbor Correspondence Matching for Flow-based Video Frame Synthesis [90.14161060260012]
We introduce a neighbor correspondence matching (NCM) algorithm for flow-based frame synthesis.
NCM is performed in a current-frame-agnostic fashion to establish multi-scale correspondences in the spatial-temporal neighborhoods of each pixel.
coarse-scale module is designed to leverage neighbor correspondences to capture large motion, while the fine-scale module is more efficient to speed up the estimation process.
arXiv Detail & Related papers (2022-07-14T09:17:00Z) - Cascading Modular Network (CAM-Net) for Multimodal Image Synthesis [7.726465518306907]
A persistent challenge has been to generate diverse versions of output images from the same input image.
We propose CAM-Net, a unified architecture that can be applied to a broad range of tasks.
It is capable of generating convincing high frequency details, achieving a reduction of the Frechet Inception Distance (FID) by up to 45.3% compared to the baseline.
arXiv Detail & Related papers (2021-06-16T17:58:13Z) - Normalizing Flows with Multi-Scale Autoregressive Priors [131.895570212956]
We introduce channel-wise dependencies in their latent space through multi-scale autoregressive priors (mAR)
Our mAR prior for models with split coupling flow layers (mAR-SCF) can better capture dependencies in complex multimodal data.
We show that mAR-SCF allows for improved image generation quality, with gains in FID and Inception scores compared to state-of-the-art flow-based models.
arXiv Detail & Related papers (2020-04-08T09:07:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.