Related papers: Diffusion Model Compression for Image-to-Image Translation

Diffusion Model Compression for Image-to-Image Translation

URL: http://arxiv.org/abs/2401.17547v2
Date: Thu, 10 Oct 2024 01:45:05 GMT
Title: Diffusion Model Compression for Image-to-Image Translation
Authors: Geonung Kim, Beomsu Kim, Eunhyeok Park, Sunghyun Cho,
Abstract summary: We propose a novel compression method tailored for diffusion-based I2I models. Based on the observations that the image conditions of I2I models already provide rich information on image structures, we develop surprisingly simple yet effective approaches for reducing the model size and latency. Our approach achieves satisfactory output quality with 39.2%, 56.4% and 39.2% reduction in model footprint, as well as 81.4%, 68.7% and 31.1% decrease in latency to InstructPix2Pix, StableSR and ControlNet, respectively.
Score: 25.46012859377184
License: http://creativecommons.org/licenses/by/4.0/
Abstract: As recent advances in large-scale Text-to-Image (T2I) diffusion models have yielded remarkable high-quality image generation, diverse downstream Image-to-Image (I2I) applications have emerged. Despite the impressive results achieved by these I2I models, their practical utility is hampered by their large model size and the computational burden of the iterative denoising process. In this paper, we propose a novel compression method tailored for diffusion-based I2I models. Based on the observations that the image conditions of I2I models already provide rich information on image structures, and that the time steps with a larger impact tend to be biased, we develop surprisingly simple yet effective approaches for reducing the model size and latency. We validate the effectiveness of our method on three representative I2I tasks: InstructPix2Pix for image editing, StableSR for image restoration, and ControlNet for image-conditional image generation. Our approach achieves satisfactory output quality with 39.2%, 56.4% and 39.2% reduction in model footprint, as well as 81.4%, 68.7% and 31.1% decrease in latency to InstructPix2Pix, StableSR and ControlNet, respectively.

Related papers

SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training [77.681908636429]
Text-to-image (T2I) models face several limitations, including large model sizes, slow, and low-quality generation on mobile devices. This paper aims to develop an extremely small and fast T2I model that generates high-resolution and high-quality images on mobile platforms.
arXiv Detail & Related papers (2024-12-12T18:59:53Z)
Efficient Pruning of Text-to-Image Models: Insights from Pruning Stable Diffusion [3.399289369740637]
This paper presents a pioneering study on post-training pruning of Stable Diffusion 2. It addresses the critical need for model compression in text-to-image domain. We propose an optimal pruning configuration that prunes the text encoder to 47.5% and the diffusion generator to 35%.
arXiv Detail & Related papers (2024-11-22T18:29:37Z)
Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion [34.70370851239368]
We show that pixel-space models can in fact be very competitive to latent approaches both in quality and efficiency. We present a simple recipe for scaling end-to-end pixel-space diffusion models to high resolutions.
arXiv Detail & Related papers (2024-10-25T06:20:06Z)
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation [52.509092010267665]
We introduce LlamaGen, a new family of image generation models that apply original next-token prediction'' paradigm of large language models to visual generation domain. It is an affirmative answer to whether vanilla autoregressive models, e.g., Llama, without inductive biases on visual signals can achieve state-of-the-art image generation performance if scaling properly.
arXiv Detail & Related papers (2024-06-10T17:59:52Z)
SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions [5.100085108873068]
We present two models, SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS (30x faster than SD v1.5) and 30 FPS (60x faster than SDXL) on a single GPU. Our training approach offers promising applications in image-conditioned control, facilitating efficient image-to-image translation.
arXiv Detail & Related papers (2024-03-25T11:16:23Z)
KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis [52.42320594388199]
We present three key practices in building an efficient text-to-image model. Based on these findings, we build two types of efficient text-to-image models, called KOALA-Turbo &-Lightning. Unlike SDXL, our KOALA models can generate 1024px high-resolution images on consumer-grade GPUs with 8GB of VRAMs (3060Ti)
arXiv Detail & Related papers (2023-12-07T02:46:18Z)
AdaDiff: Adaptive Step Selection for Fast Diffusion [88.8198344514677]
We introduce AdaDiff, a framework designed to learn instance-specific step usage policies. AdaDiff is optimized using a policy gradient method to maximize a carefully designed reward function. Our approach achieves similar results in terms of visual quality compared to the baseline using a fixed 50 denoising steps.
arXiv Detail & Related papers (2023-11-24T11:20:38Z)
DiffI2I: Efficient Diffusion Model for Image-to-Image Translation [108.82579440308267]
Diffusion Model (DM) has emerged as the SOTA approach for image synthesis. DM can't perform well on some image-to-image translation (I2I) tasks. DiffI2I comprises three key components: a compact I2I prior extraction network (CPEN), a dynamic I2I transformer (DI2Iformer) and a denoising network.
arXiv Detail & Related papers (2023-08-26T05:18:23Z)
ACDMSR: Accelerated Conditional Diffusion Models for Single Image Super-Resolution [84.73658185158222]
We propose a diffusion model-based super-resolution method called ACDMSR. Our method adapts the standard diffusion model to perform super-resolution through a deterministic iterative denoising process. Our approach generates more visually realistic counterparts for low-resolution images, emphasizing its effectiveness in practical scenarios.
arXiv Detail & Related papers (2023-07-03T06:49:04Z)
Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models [6.821399706256863]
W"urstchen is a novel architecture for text-to-image synthesis that combines competitive performance with unprecedented cost-effectiveness. A key contribution of our work is to develop a latent diffusion technique in which we learn a detailed but extremely compact semantic image representation.
arXiv Detail & Related papers (2023-06-01T13:00:53Z)
Refusion: Enabling Large-Size Realistic Image Restoration with Latent-Space Diffusion Models [9.245782611878752]
We enhance the diffusion model in several aspects such as network architecture, noise level, denoising steps, training image size, and perceptual/scheduler scores. We also propose a U-Net based latent diffusion model which performs diffusion in a low-resolution latent space while preserving high-resolution information from the original input for the decoding process. These modifications allow us to apply diffusion models to various image restoration tasks, including real-world shadow removal, HR non-homogeneous dehazing, stereo super-resolution, and bokeh effect transformation.
arXiv Detail & Related papers (2023-04-17T14:06:49Z)
On Distillation of Guided Diffusion Models [94.95228078141626]
We propose an approach to distilling classifier-free guided diffusion models into models that are fast to sample from. For standard diffusion models trained on the pixelspace, our approach is able to generate images visually comparable to that of the original model. For diffusion models trained on the latent-space (e.g., Stable Diffusion), our approach is able to generate high-fidelity images using as few as 1 to 4 denoising steps.
arXiv Detail & Related papers (2022-10-06T18:03:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.