Related papers: PipeFusion: Displaced Patch Pipeline Parallelism for Inference of Diffusion Transformer Models

PipeFusion: Displaced Patch Pipeline Parallelism for Inference of Diffusion Transformer Models

URL: http://arxiv.org/abs/2405.14430v2
Date: Sun, 26 May 2024 04:57:33 GMT
Title: PipeFusion: Displaced Patch Pipeline Parallelism for Inference of Diffusion Transformer Models
Authors: Jiannan Wang, Jiarui Fang, Aoyu Li, PengCheng Yang,
Abstract summary: This paper introduces PipeFusion, a novel approach to generate high-resolution images with diffusion transformers (DiT) models. By leveraging the high similarity between the input from adjacent diffusion steps, PipeFusion eliminates the waiting time in the pipeline. Our experiments demonstrate that it can generate higher image resolution where existing DiT parallel approaches meet OOM.
Score: 11.116433576371515
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This paper introduces PipeFusion, a novel approach that harnesses multi-GPU parallelism to address the high computational and latency challenges of generating high-resolution images with diffusion transformers (DiT) models. PipeFusion splits images into patches and distributes the network layers across multiple devices. It employs a pipeline parallel manner to orchestrate communication and computations. By leveraging the high similarity between the input from adjacent diffusion steps, PipeFusion eliminates the waiting time in the pipeline by reusing the one-step stale feature maps to provide context for the current step. Our experiments demonstrate that it can generate higher image resolution where existing DiT parallel approaches meet OOM. PipeFusion significantly reduces the required communication bandwidth, enabling DiT inference to be hosted on GPUs connected via PCIe rather than the more costly NVLink infrastructure, which substantially lowers the overall operational expenses for serving DiT models. Our code is publicly available at https://github.com/PipeFusion/PipeFusion.

Related papers

Pipette: Automatic Fine-grained Large Language Model Training Configurator for Real-World Clusters [5.190794062263327]
Training large language models (LLMs) is known to be challenging because of the huge computational and memory capacity requirements. We propose Pipette, which is an automatic fine-grained LLM training for real-world clusters.
arXiv Detail & Related papers (2024-05-28T11:59:44Z)
DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models [44.384572903945724]
We propose DistriFusion to tackle the problem of generating high-resolution images with diffusion models. Our method splits the model input into multiple patches and assigns each patch to a GPU. Our method can be applied to recent Stable Diffusion XL with no quality degradation and achieve up to a 6.1$times$ speedup on eight NVIDIA A100s compared to one.
arXiv Detail & Related papers (2024-02-29T18:59:58Z)
Pipe-BD: Pipelined Parallel Blockwise Distillation [7.367308544773381]
We propose Pipe-BD, a novel parallelization method for blockwise distillation. Pipe-BD aggressively utilizes pipeline parallelism for blockwise distillation. We implement Pipe-BD on PyTorch, and experiments reveal that Pipe-BD is effective on multiple scenarios, models, and datasets.
arXiv Detail & Related papers (2023-01-29T13:38:43Z)
CDDFuse: Correlation-Driven Dual-Branch Feature Decomposition for Multi-Modality Image Fusion [138.40422469153145]
We propose a novel Correlation-Driven feature Decomposition Fusion (CDDFuse) network. We show that CDDFuse achieves promising results in multiple fusion tasks, including infrared-visible image fusion and medical image fusion.
arXiv Detail & Related papers (2022-11-26T02:40:28Z)
PnP-DETR: Towards Efficient Visual Analysis with Transformers [146.55679348493587]
Recently, DETR pioneered the solution vision tasks with transformers, it directly translates the image feature map into the object result. Recent transformer-based image recognition model andTT show consistent efficiency gain.
arXiv Detail & Related papers (2021-09-15T01:10:30Z)
Image Fusion Transformer [75.71025138448287]
In image fusion, images obtained from different sensors are fused to generate a single image with enhanced information. In recent years, state-of-the-art methods have adopted Convolution Neural Networks (CNNs) to encode meaningful features for image fusion. We propose a novel Image Fusion Transformer (IFT) where we develop a transformer-based multi-scale fusion strategy.
arXiv Detail & Related papers (2021-07-19T16:42:49Z)
PipeTransformer: Automated Elastic Pipelining for Distributed Training of Transformers [47.194426122333205]
PipeTransformer is a distributed training algorithm for Transformer models. It automatically adjusts the pipelining and data parallelism by identifying and freezing some layers during the training. We evaluate PipeTransformer using Vision Transformer (ViT) on ImageNet and BERT on GLUE and SQuAD datasets.
arXiv Detail & Related papers (2021-02-05T13:39:31Z)
Rethinking Learning-based Demosaicing, Denoising, and Super-Resolution Pipeline [86.01209981642005]
We study the effects of pipelines on the mixture problem of learning-based DN, DM, and SR, in both sequential and joint solutions. Our suggested pipeline DN$to$SR$to$DM yields consistently better performance than other sequential pipelines. We propose an end-to-end Trinity Pixel Enhancement NETwork (TENet) that achieves state-of-the-art performance for the mixture problem.
arXiv Detail & Related papers (2019-05-07T13:19:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.