Related papers: Unified Continuous Generative Models

Unified Continuous Generative Models

URL: http://arxiv.org/abs/2505.07447v2
Date: Tue, 20 May 2025 12:27:53 GMT
Title: Unified Continuous Generative Models
Authors: Peng Sun, Yi Jiang, Tao Lin,
Abstract summary: We introduce a unified framework for training, sampling, and analyzing continuous generative models.<n>Our implementation achieves state-of-the-art (SOTA) performance.
Score: 12.358393766570732
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in continuous generative models, including multi-step approaches like diffusion and flow-matching (typically requiring 8-1000 sampling steps) and few-step methods such as consistency models (typically 1-8 steps), have demonstrated impressive generative performance. However, existing work often treats these approaches as distinct paradigms, resulting in separate training and sampling methodologies. We introduce a unified framework for training, sampling, and analyzing these models. Our implementation, the Unified Continuous Generative Models Trainer and Sampler (UCGM-{T,S}), achieves state-of-the-art (SOTA) performance. For example, on ImageNet 256x256 using a 675M diffusion transformer, UCGM-T trains a multi-step model achieving 1.30 FID in 20 steps and a few-step model reaching 1.42 FID in just 2 steps. Additionally, applying UCGM-S to a pre-trained model (previously 1.26 FID at 250 steps) improves performance to 1.06 FID in only 40 steps. Code is available at: https://github.com/LINs-lab/UCGM.

Related papers

Inductive Moment Matching [80.96561758341664]
We propose Inductive Moment Matching (IMM), a new class of generative models for one- or few-step sampling with a single-stage training procedure.<n>IMM surpasses diffusion models on ImageNet-256x256 with 1.99 FID using only 8 inference steps and achieves state-of-the-art 2-step FID of 1.98 on CIFAR-10 for a model trained from scratch.
arXiv Detail & Related papers (2025-03-10T17:37:39Z)
One-Step Diffusion Distillation through Score Implicit Matching [74.91234358410281]
We present Score Implicit Matching (SIM) a new approach to distilling pre-trained diffusion models into single-step generator models. SIM shows strong empirical performances for one-step generators. By applying SIM to a leading transformer-based diffusion model, we distill a single-step generator for text-to-image generation.
arXiv Detail & Related papers (2024-10-22T08:17:20Z)
Truncated Consistency Models [57.50243901368328]
Training consistency models requires learning to map all intermediate points along PF ODE trajectories to their corresponding endpoints.<n>We empirically find that this training paradigm limits the one-step generation performance of consistency models.<n>We propose a new parameterization of the consistency function and a two-stage training procedure that prevents the truncated-time training from collapsing to a trivial solution.
arXiv Detail & Related papers (2024-10-18T22:38:08Z)
One Step Diffusion via Shortcut Models [109.72495454280627]
We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples.<n>Shortcut models condition the network on the current noise level and also on the desired step size, allowing the model to skip ahead in the generation process.<n>Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.
arXiv Detail & Related papers (2024-10-16T13:34:40Z)
Directly Denoising Diffusion Models [6.109141407163027]
We present Directly Denoising Diffusion Model (DDDM), a simple and generic approach for generating realistic images with few-step sampling. Our model achieves FID scores of 2.57 and 2.33 on CIFAR-10 in one-step and two-step sampling respectively, surpassing those obtained from GANs and distillation-based models. For ImageNet 64x64, our approach stands as a competitive contender against leading models.
arXiv Detail & Related papers (2024-05-22T11:20:32Z)
Multistep Consistency Models [24.443707181138553]
A 1-step consistency model is a conventional consistency model whereas a $infty$-step consistency model is a diffusion model. By increasing the sample budget from a single step to 2-8 steps, we can train models more easily that generate higher quality samples. We show that our method scales to a text-to-image diffusion model, generating samples that are close to the quality of the original model.
arXiv Detail & Related papers (2024-03-11T15:26:34Z)
AutoDiffusion: Training-Free Optimization of Time Steps and Architectures for Automated Diffusion Model Acceleration [57.846038404893626]
We propose to search the optimal time steps sequence and compressed model architecture in a unified framework to achieve effective image generation for diffusion models without any further training. Experimental results show that our method achieves excellent performance by using only a few time steps, e.g. 17.86 FID score on ImageNet 64 $times$ 64 with only four steps, compared to 138.66 with DDIM.
arXiv Detail & Related papers (2023-09-19T08:57:24Z)
Consistency Models [89.68380014789861]
We propose a new family of models that generate high quality samples by directly mapping noise to data. They support fast one-step generation by design, while still allowing multistep sampling to trade compute for sample quality. They also support zero-shot data editing, such as image inpainting, colorization, and super-resolution, without requiring explicit training.
arXiv Detail & Related papers (2023-03-02T18:30:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.