T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with
Trajectory Stitching
- URL: http://arxiv.org/abs/2402.14167v1
- Date: Wed, 21 Feb 2024 23:08:54 GMT
- Title: T-Stitch: Accelerating Sampling in Pre-Trained Diffusion Models with
Trajectory Stitching
- Authors: Zizheng Pan, Bohan Zhuang, De-An Huang, Weili Nie, Zhiding Yu, Chaowei
Xiao, Jianfei Cai, Anima Anandkumar
- Abstract summary: Trajectory Stitching T-Stitch is a simple yet efficient technique to improve the sampling efficiency with little or no generation degradation.
Our key insight is that different diffusion models learn similar encodings under the same training data distribution.
Our method can also be used as a drop-in technique to accelerate the popular pretrained stable diffusion (SD) models.
- Score: 143.72720563387082
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Sampling from diffusion probabilistic models (DPMs) is often expensive for
high-quality image generation and typically requires many steps with a large
model. In this paper, we introduce sampling Trajectory Stitching T-Stitch, a
simple yet efficient technique to improve the sampling efficiency with little
or no generation degradation. Instead of solely using a large DPM for the
entire sampling trajectory, T-Stitch first leverages a smaller DPM in the
initial steps as a cheap drop-in replacement of the larger DPM and switches to
the larger DPM at a later stage. Our key insight is that different diffusion
models learn similar encodings under the same training data distribution and
smaller models are capable of generating good global structures in the early
steps. Extensive experiments demonstrate that T-Stitch is training-free,
generally applicable for different architectures, and complements most existing
fast sampling techniques with flexible speed and quality trade-offs. On DiT-XL,
for example, 40% of the early timesteps can be safely replaced with a 10x
faster DiT-S without performance drop on class-conditional ImageNet generation.
We further show that our method can also be used as a drop-in technique to not
only accelerate the popular pretrained stable diffusion (SD) models but also
improve the prompt alignment of stylized SD models from the public model zoo.
Code is released at https://github.com/NVlabs/T-Stitch
Related papers
- One Step Diffusion via Shortcut Models [109.72495454280627]
We introduce shortcut models, a family of generative models that use a single network and training phase to produce high-quality samples.
Shortcut models condition the network on the current noise level and also on the desired step size, allowing the model to skip ahead in the generation process.
Compared to distillation, shortcut models reduce complexity to a single network and training phase and additionally allow varying step budgets at inference time.
arXiv Detail & Related papers (2024-10-16T13:34:40Z) - Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [72.48325960659822]
One main bottleneck in training large-scale diffusion models for generation lies in effectively learning these representations.
We study this by introducing a straightforward regularization called REPresentation Alignment (REPA), which aligns the projections of noisy input hidden states in denoising networks with clean image representations obtained from external, pretrained visual encoders.
The results are striking: our simple strategy yields significant improvements in both training efficiency and generation quality when applied to popular diffusion and flow-based transformers, such as DiTs and SiTs.
arXiv Detail & Related papers (2024-10-09T14:34:53Z) - Single Parent Family: A Spectrum of Family Members from a Single Pre-Trained Foundation Model [20.054342930450055]
This paper introduces a novel method of Progressive Low Rank Decomposition (PLRD) tailored for the compression of large language models.
PLRD allows for significant reductions in computational overhead and energy consumption.
Our findings suggest that PLRD could set a new standard for the efficient scaling of LLMs.
arXiv Detail & Related papers (2024-06-28T15:27:57Z) - A-SDM: Accelerating Stable Diffusion through Redundancy Removal and
Performance Optimization [54.113083217869516]
In this work, we first explore the computational redundancy part of the network.
We then prune the redundancy blocks of the model and maintain the network performance.
Thirdly, we propose a global-regional interactive (GRI) attention to speed up the computationally intensive attention part.
arXiv Detail & Related papers (2023-12-24T15:37:47Z) - DPM-OT: A New Diffusion Probabilistic Model Based on Optimal Transport [26.713392774427653]
DPM-OT is a unified learning framework for fast DPMs with a direct expressway represented by OT map.
It can generate high-quality samples within around 10 function evaluations.
Experiments validate the effectiveness and advantages of DPM-OT in terms of speed and quality.
arXiv Detail & Related papers (2023-07-21T02:28:54Z) - Optimizing DDPM Sampling with Shortcut Fine-Tuning [16.137936204766692]
Shortcut Fine-Tuning (SFT) is a new approach for addressing the challenge of fast sampling of pretrained Denoising Diffusion Probabilistic Models (DDPMs)
SFT advocates for the fine-tuning of DDPM samplers through the direct minimization of Integral Probability Metrics (IPM)
Inspired by a control perspective, we propose a new algorithm SFT-PG: Shortcut Fine-Tuning with Policy Gradient.
arXiv Detail & Related papers (2023-01-31T01:37:48Z) - Learning to Efficiently Sample from Diffusion Probabilistic Models [49.58748345998702]
Denoising Diffusion Probabilistic Models (DDPMs) can yield high-fidelity samples and competitive log-likelihoods across a range of domains.
We introduce an exact dynamic programming algorithm that finds the optimal discrete time schedules for any pre-trained DDPM.
arXiv Detail & Related papers (2021-06-07T17:15:07Z) - Denoising Diffusion Implicit Models [117.03720513930335]
We present denoising diffusion implicit models (DDIMs) for iterative implicit probabilistic models with the same training procedure as DDPMs.
DDIMs can produce high quality samples $10 times$ to $50 times$ faster in terms of wall-clock time compared to DDPMs.
arXiv Detail & Related papers (2020-10-06T06:15:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.