DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
- URL: http://arxiv.org/abs/2402.19481v4
- Date: Sun, 14 Jul 2024 21:30:14 GMT
- Title: DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
- Authors: Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, Song Han,
- Abstract summary: We propose DistriFusion to tackle the problem of generating high-resolution images with diffusion models.
Our method splits the model input into multiple patches and assigns each patch to a GPU.
Our method can be applied to recent Stable Diffusion XL with no quality degradation and achieve up to a 6.1$times$ speedup on eight NVIDIA A100s compared to one.
- Score: 44.384572903945724
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Diffusion models have achieved great success in synthesizing high-quality images. However, generating high-resolution images with diffusion models is still challenging due to the enormous computational costs, resulting in a prohibitive latency for interactive applications. In this paper, we propose DistriFusion to tackle this problem by leveraging parallelism across multiple GPUs. Our method splits the model input into multiple patches and assigns each patch to a GPU. However, naively implementing such an algorithm breaks the interaction between patches and loses fidelity, while incorporating such an interaction will incur tremendous communication overhead. To overcome this dilemma, we observe the high similarity between the input from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step. Therefore, our method supports asynchronous communication, which can be pipelined by computation. Extensive experiments show that our method can be applied to recent Stable Diffusion XL with no quality degradation and achieve up to a 6.1$\times$ speedup on eight NVIDIA A100s compared to one. Our code is publicly available at https://github.com/mit-han-lab/distrifuser.
Related papers
- Self-Refining Diffusion Samplers: Enabling Parallelization via Parareal Iterations [53.180374639531145]
Self-Refining Diffusion Samplers (SRDS) retain sample quality and can improve latency at the cost of additional parallel compute.
We take inspiration from the Parareal algorithm, a popular numerical method for parallel-in-time integration of differential equations.
arXiv Detail & Related papers (2024-12-11T11:08:09Z) - Partially Conditioned Patch Parallelism for Accelerated Diffusion Model Inference [0.7619404259039282]
Diffusion models have exhibited exciting capabilities in generating images and are also very promising for video creation.
The sequential denoising steps required for generating a single sample could take tens or hundreds of iterations.
We propose Partially Conditioned Patch Parallelism to accelerate the inference of high-resolution diffusion models.
arXiv Detail & Related papers (2024-12-04T02:12:50Z) - xDiT: an Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism [5.704297874096985]
Diffusion models are pivotal for generating high-quality images and videos.
This paper introduces xDiT, a comprehensive parallel inference engine for DiTs.
Notably, we are the first to demonstrate DiTs scalability on Ethernet-connected GPU clusters.
arXiv Detail & Related papers (2024-11-04T01:40:38Z) - SpotDiffusion: A Fast Approach For Seamless Panorama Generation Over Time [7.532695984765271]
We present a novel approach to generate high-resolution images with generative models.
Our method shifts non-overlapping denoising windows over time, ensuring that seams in one timestep are corrected in the next.
Our method offers several key benefits, including improved computational efficiency and faster inference times.
arXiv Detail & Related papers (2024-07-22T09:44:35Z) - AsyncDiff: Parallelizing Diffusion Models by Asynchronous Denoising [49.785626309848276]
AsyncDiff is a universal and plug-and-play acceleration scheme that enables model parallelism across multiple devices.
For the Stable Diffusion v2.1, AsyncDiff achieves a 2.7x speedup with negligible degradation and a 4.0x speedup with only a slight reduction of 0.38 in CLIP Score.
Our experiments also demonstrate that AsyncDiff can be readily applied to video diffusion models with encouraging performances.
arXiv Detail & Related papers (2024-06-11T03:09:37Z) - PipeFusion: Patch-level Pipeline Parallelism for Diffusion Transformers Inference [5.704297874096985]
PipeFusion partitions images into patches and the model layers across multiple GPU.
It employs a patch-level pipeline parallel strategy to orchestrate communication and computation efficiently.
arXiv Detail & Related papers (2024-05-23T11:00:07Z) - Lightning-Fast Image Inversion and Editing for Text-to-Image Diffusion Models [46.729930784279645]
We formulate the problem by finding the roots of an implicit equation and devlop a method to solve it efficiently.
Our solution is based on Newton-Raphson (NR), a well-known technique in numerical analysis.
We show improved results in image and generation of rare objects.
arXiv Detail & Related papers (2023-12-19T19:19:19Z) - Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference [95.42299246592756]
We study the UNet encoder and empirically analyze the encoder features.
We find that encoder features change minimally, whereas the decoder features exhibit substantial variations across different time-steps.
We validate our approach on other tasks: text-to-video, personalized generation and reference-guided generation.
arXiv Detail & Related papers (2023-12-15T08:46:43Z) - Prompt-tuning latent diffusion models for inverse problems [72.13952857287794]
We propose a new method for solving imaging inverse problems using text-to-image latent diffusion models as general priors.
Our method, called P2L, outperforms both image- and latent-diffusion model-based inverse problem solvers on a variety of tasks, such as super-resolution, deblurring, and inpainting.
arXiv Detail & Related papers (2023-10-02T11:31:48Z) - SDM: Spatial Diffusion Model for Large Hole Image Inpainting [106.90795513361498]
We present a novel spatial diffusion model (SDM) that uses a few iterations to gradually deliver informative pixels to the entire image.
Also, thanks to the proposed decoupled probabilistic modeling and spatial diffusion scheme, our method achieves high-quality large-hole completion.
arXiv Detail & Related papers (2022-12-06T13:30:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.