SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
- URL: http://arxiv.org/abs/2401.08740v2
- Date: Mon, 23 Sep 2024 15:59:41 GMT
- Title: SiT: Exploring Flow and Diffusion-based Generative Models with Scalable Interpolant Transformers
- Authors: Nanye Ma, Mark Goldstein, Michael S. Albergo, Nicholas M. Boffi, Eric Vanden-Eijnden, Saining Xie,
- Abstract summary: generative models built on the backbone of Diffusion Transformers (DiT)
Interpolant framework allows for connecting two distributions in a more flexible way than standard diffusion models.
SiT surpasses DiT uniformly across model sizes on the conditional ImageNet 256x256 and 512x512 benchmark.
- Score: 33.15117998855855
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present Scalable Interpolant Transformers (SiT), a family of generative models built on the backbone of Diffusion Transformers (DiT). The interpolant framework, which allows for connecting two distributions in a more flexible way than standard diffusion models, makes possible a modular study of various design choices impacting generative models built on dynamical transport: learning in discrete or continuous time, the objective function, the interpolant that connects the distributions, and deterministic or stochastic sampling. By carefully introducing the above ingredients, SiT surpasses DiT uniformly across model sizes on the conditional ImageNet 256x256 and 512x512 benchmark using the exact same model structure, number of parameters, and GFLOPs. By exploring various diffusion coefficients, which can be tuned separately from learning, SiT achieves an FID-50K score of 2.06 and 2.62, respectively.
Related papers
- One-Step Diffusion Distillation through Score Implicit Matching [74.91234358410281]
We present Score Implicit Matching (SIM) a new approach to distilling pre-trained diffusion models into single-step generator models.
SIM shows strong empirical performances for one-step generators.
By applying SIM to a leading transformer-based diffusion model, we distill a single-step generator for text-to-image generation.
arXiv Detail & Related papers (2024-10-22T08:17:20Z) - Dynamic Diffusion Transformer [67.13876021157887]
Diffusion Transformer (DiT) has demonstrated superior performance but suffers from substantial computational costs.
We propose Dynamic Diffusion Transformer (DyDiT), an architecture that dynamically adjusts its computation along both timestep and spatial dimensions during generation.
With 3% additional fine-tuning, our method reduces the FLOPs of DiT-XL by 51%, accelerates generation by 1.73, and achieves a competitive FID score of 2.07 on ImageNet.
arXiv Detail & Related papers (2024-10-04T14:14:28Z) - Local Flow Matching Generative Models [19.859984725284896]
Flow Matching (FM) is a simulation-free method for learning a continuous and invertible flow to interpolate between two distributions.
We introduce Local Flow Matching (LFM), which learns a sequence of FM sub-models and each matches a diffusion process up to the time of the step size in the data-to-noise direction.
In experiments, we demonstrate the improved training efficiency and competitive generative performance of LFM compared to FM.
arXiv Detail & Related papers (2024-10-03T14:53:10Z) - TerDiT: Ternary Diffusion Models with Transformers [83.94829676057692]
TerDiT is a quantization-aware training scheme for ternary diffusion models with transformers.
We focus on the ternarization of DiT networks and scale model sizes from 600M to 4.2B.
arXiv Detail & Related papers (2024-05-23T17:57:24Z) - Diffscaler: Enhancing the Generative Prowess of Diffusion Transformers [34.611309081801345]
This paper focuses on enabling a single pre-trained diffusion transformer model to scale across multiple datasets swiftly.
We propose DiffScaler, an efficient scaling strategy for diffusion models where we train a minimal amount of parameters to adapt to different tasks.
We find that transformer-based diffusion models significantly outperform CNN-based diffusion models methods while performing fine-tuning over smaller datasets.
arXiv Detail & Related papers (2024-04-15T17:55:43Z) - Synaptogen: A cross-domain generative device model for large-scale neuromorphic circuit design [1.704443882665726]
We present a fast generative modeling approach for resistive memories that reproduces the complex statistical properties of real-world devices.
By training on extensive measurement data of integrated 1T1R arrays, an autoregressive process accurately accounts for the cross-correlations between the parameters.
Benchmarks show that this statistically comprehensive model read/writes throughput exceeds those of even highly simplified and deterministic compact models.
arXiv Detail & Related papers (2024-04-09T14:33:03Z) - Synthetic location trajectory generation using categorical diffusion
models [50.809683239937584]
Diffusion models (DPMs) have rapidly evolved to be one of the predominant generative models for the simulation of synthetic data.
We propose using DPMs for the generation of synthetic individual location trajectories (ILTs) which are sequences of variables representing physical locations visited by individuals.
arXiv Detail & Related papers (2024-02-19T15:57:39Z) - DiffiT: Diffusion Vision Transformers for Image Generation [88.08529836125399]
Vision Transformer (ViT) has demonstrated strong modeling capabilities and scalability, especially for recognition tasks.
We study the effectiveness of ViTs in diffusion-based generative learning and propose a new model denoted as Diffusion Vision Transformers (DiffiT)
DiffiT is surprisingly effective in generating high-fidelity images with significantly better parameter efficiency.
arXiv Detail & Related papers (2023-12-04T18:57:01Z) - DiffFlow: A Unified SDE Framework for Score-Based Diffusion Models and
Generative Adversarial Networks [41.451880167535776]
We propose a unified theoretic framework for explicit generative models (SDMs) and generative adversarial nets (GANs)
Under our unified theoretic framework, we introduce several instantiations of the DiffFLow that provide new algorithms beyond GANs and SDMs with exact likelihood inference.
arXiv Detail & Related papers (2023-07-05T10:00:53Z) - Scalable Diffusion Models with Transformers [18.903245758902834]
We train latent diffusion models of images, replacing the commonly-used U-Net backbone with a transformer that operates on latent patches.
We find that DiTs with higher Gflops -- through increased transformer depth/width or increased number of input tokens -- consistently have lower FID.
arXiv Detail & Related papers (2022-12-19T18:59:58Z) - On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from.
For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model.
For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.