Task-Oriented Diffusion Model Compression
- URL: http://arxiv.org/abs/2401.17547v1
- Date: Wed, 31 Jan 2024 02:25:52 GMT
- Title: Task-Oriented Diffusion Model Compression
- Authors: Geonung Kim, Beomsu Kim, Eunhyeok Park, Sunghyun Cho
- Abstract summary: Large-scale Text-to-Image (T2I) diffusion models have yielded remarkable high-quality image generation, diverse downstream Image-to-Image (I2I) applications have emerged.
Despite the impressive results achieved by these I2I models, their practical utility is hampered by their large model size and the computational burden of the iterative denoising process.
In this paper, we explore the compression potential of these I2I models in a task-oriented manner and introduce a novel method for reducing both model size and the number of timesteps.
- Score: 27.813361445528397
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As recent advancements in large-scale Text-to-Image (T2I) diffusion models
have yielded remarkable high-quality image generation, diverse downstream
Image-to-Image (I2I) applications have emerged. Despite the impressive results
achieved by these I2I models, their practical utility is hampered by their
large model size and the computational burden of the iterative denoising
process. In this paper, we explore the compression potential of these I2I
models in a task-oriented manner and introduce a novel method for reducing both
model size and the number of timesteps. Through extensive experiments, we
observe key insights and use our empirical knowledge to develop practical
solutions that aim for near-optimal results with minimal exploration costs. We
validate the effectiveness of our method by applying it to InstructPix2Pix for
image editing and StableSR for image restoration. Our approach achieves
satisfactory output quality with 39.2% and 56.4% reduction in model footprint
and 81.4% and 68.7% decrease in latency to InstructPix2Pix and StableSR,
respectively.
Related papers
- SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions [5.100085108873068]
We present two models, SDXS-512 and SDXS-1024, achieving inference speeds of approximately 100 FPS (30x faster than SD v1.5) and 30 FPS (60x faster than SDXL) on a single GPU.
Our training approach offers promising applications in image-conditioned control, facilitating efficient image-to-image translation.
arXiv Detail & Related papers (2024-03-25T11:16:23Z) - ToDo: Token Downsampling for Efficient Generation of High-Resolution Images [5.213225264281229]
This paper investigates the importance of dense attention in generative image models, which often contain redundant features, making them suitable for sparser attention mechanisms.
We propose a novel training-free method ToDo that relies on token downsampling of key and value tokens to accelerate Stable Diffusion inference by up to 2x for common sizes and up to 4.5x or more for high resolutions like 2048x2048.
arXiv Detail & Related papers (2024-02-21T07:10:28Z) - Compressing Deep Image Super-resolution Models [2.895266689123347]
This work employs a three-stage workflow for compressing deep SR models which significantly reduces their memory requirement.
We have applied this approach to two popular image super-resolution networks, SwinIR and EDSR, to demonstrate its effectiveness.
The resulting compact models, SwinIRmini and EDSRmini, attain an 89% and 96% reduction in both model size and floating-point operations.
arXiv Detail & Related papers (2023-12-31T15:38:50Z) - Learning from History: Task-agnostic Model Contrastive Learning for
Image Restoration [79.04007257606862]
This paper introduces an innovative method termed 'learning from history', which dynamically generates negative samples from the target model itself.
Our approach, named Model Contrastive Learning for Image Restoration (MCLIR), rejuvenates latency models as negative models, making it compatible with diverse image restoration tasks.
arXiv Detail & Related papers (2023-09-12T07:50:54Z) - ACDMSR: Accelerated Conditional Diffusion Models for Single Image
Super-Resolution [84.73658185158222]
We propose a diffusion model-based super-resolution method called ACDMSR.
Our method adapts the standard diffusion model to perform super-resolution through a deterministic iterative denoising process.
Our approach generates more visually realistic counterparts for low-resolution images, emphasizing its effectiveness in practical scenarios.
arXiv Detail & Related papers (2023-07-03T06:49:04Z) - Low-Light Image Enhancement with Wavelet-based Diffusion Models [50.632343822790006]
Diffusion models have achieved promising results in image restoration tasks, yet suffer from time-consuming, excessive computational resource consumption, and unstable restoration.
We propose a robust and efficient Diffusion-based Low-Light image enhancement approach, dubbed DiffLL.
arXiv Detail & Related papers (2023-06-01T03:08:28Z) - Towards Bidirectional Arbitrary Image Rescaling: Joint Optimization and
Cycle Idempotence [76.93002743194974]
We propose a method to treat arbitrary rescaling, both upscaling and downscaling, as one unified process.
The proposed model is able to learn upscaling and downscaling simultaneously and achieve bidirectional arbitrary image rescaling.
It is shown to be robust in cycle idempotence test, free of severe degradations in reconstruction accuracy when the downscaling-to-upscaling cycle is applied repetitively.
arXiv Detail & Related papers (2022-03-02T07:42:15Z) - Uncovering the Over-smoothing Challenge in Image Super-Resolution: Entropy-based Quantification and Contrastive Optimization [67.99082021804145]
We propose an explicit solution to the COO problem, called Detail Enhanced Contrastive Loss (DECLoss)
DECLoss utilizes the clustering property of contrastive learning to directly reduce the variance of the potential high-resolution distribution.
We evaluate DECLoss on multiple super-resolution benchmarks and demonstrate that it improves the perceptual quality of PSNR-oriented models.
arXiv Detail & Related papers (2022-01-04T08:30:09Z) - Knowledge distillation: A good teacher is patient and consistent [71.14922743774864]
There is a growing discrepancy in computer vision between large-scale models that achieve state-of-the-art performance and models that are affordable in practical applications.
We identify certain implicit design choices, which may drastically affect the effectiveness of distillation.
We obtain a state-of-the-art ResNet-50 model for ImageNet, which achieves 82.8% top-1 accuracy.
arXiv Detail & Related papers (2021-06-09T17:20:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.