DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models
- URL: http://arxiv.org/abs/2501.04304v2
- Date: Wed, 12 Feb 2025 10:49:40 GMT
- Title: DGQ: Distribution-Aware Group Quantization for Text-to-Image Diffusion Models
- Authors: Hyogon Ryu, NaHyeon Park, Hyunjung Shim,
- Abstract summary: We analyze the challenges associated with quantizing text-to-image diffusion models from a distributional perspective.
We propose Distribution-aware Group Quantization (DGQ), a method that adaptively handles pixel-wise and channel-wise outliers to preserve image quality.
Our method demonstrates remarkable performance on datasets such as MS-COCO and PartiPrompts.
- Score: 12.875837358532422
- License:
- Abstract: Despite the widespread use of text-to-image diffusion models across various tasks, their computational and memory demands limit practical applications. To mitigate this issue, quantization of diffusion models has been explored. It reduces memory usage and computational costs by compressing weights and activations into lower-bit formats. However, existing methods often struggle to preserve both image quality and text-image alignment, particularly in lower-bit($<$ 8bits) quantization. In this paper, we analyze the challenges associated with quantizing text-to-image diffusion models from a distributional perspective. Our analysis reveals that activation outliers play a crucial role in determining image quality. Additionally, we identify distinctive patterns in cross-attention scores, which significantly affects text-image alignment. To address these challenges, we propose Distribution-aware Group Quantization (DGQ), a method that identifies and adaptively handles pixel-wise and channel-wise outliers to preserve image quality. Furthermore, DGQ applies prompt-specific logarithmic quantization scales to maintain text-image alignment. Our method demonstrates remarkable performance on datasets such as MS-COCO and PartiPrompts. We are the first to successfully achieve low-bit quantization of text-to-image diffusion models without requiring additional fine-tuning of weight quantization parameters. Code is available at https://github.com/ugonfor/DGQ.
Related papers
- Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers [59.772673692679085]
We propose SPDFQ, a Semantics Prompting Data-Free Quantization method for ViTs.
First, SPDFQ incorporates Attention Priors Alignment (APA), which uses randomly generated attention priors to enhance the semantics of synthetic images.
Second, SPDFQ introduces Multi-Semantic Reinforcement (MSR), which utilizes localized patch optimization to prompt efficient parameterization.
arXiv Detail & Related papers (2024-12-21T09:30:45Z) - Data Generation for Hardware-Friendly Post-Training Quantization [3.3998740964877463]
Zero-shot quantization (ZSQ) using synthetic data is a key approach for post-training quantization (PTQ) under privacy and security constraints.
Existing data generation methods often struggle to effectively generate data suitable for hardware-friendly quantization.
We propose Data Generation for Hardware-friendly quantization (DGH), a novel method that addresses these gaps.
arXiv Detail & Related papers (2024-10-29T15:08:50Z) - Unsupervised Modality Adaptation with Text-to-Image Diffusion Models for Semantic Segmentation [54.96563068182733]
We propose Modality Adaptation with text-to-image Diffusion Models (MADM) for semantic segmentation task.
MADM utilizes text-to-image diffusion models pre-trained on extensive image-text pairs to enhance the model's cross-modality capabilities.
We show that MADM achieves state-of-the-art adaptation performance across various modality tasks, including images to depth, infrared, and event modalities.
arXiv Detail & Related papers (2024-10-29T03:49:40Z) - Thinking in Granularity: Dynamic Quantization for Image Super-Resolution by Intriguing Multi-Granularity Clues [16.254064282215033]
We propose Granular-DQ, which capitalizes on the intrinsic characteristics of images while dispensing with the previous consideration for layer sensitivity in quantization.
Granular-DQ conducts a multi-granularity analysis of local patches with further exploration of their information densities.
arXiv Detail & Related papers (2024-09-22T06:29:54Z) - Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization [33.20136645196318]
State-of-the-art text-to-image models are becoming less accessible in practice.
Post-training quantization (PTQ) tackles this issue by compressing the pretrained model weights into lower-bit representations.
This work demonstrates that more versatile vector quantization (VQ) may achieve higher compression rates for large-scale text-to-image diffusion models.
arXiv Detail & Related papers (2024-08-31T16:09:20Z) - Post-training Quantization for Text-to-Image Diffusion Models with Progressive Calibration and Activation Relaxing [49.800746112114375]
We propose a novel post-training quantization method (Progressive and Relaxing) for text-to-image diffusion models.
We are the first to achieve quantization for Stable Diffusion XL while maintaining the performance.
arXiv Detail & Related papers (2023-11-10T09:10:09Z) - Beyond Learned Metadata-based Raw Image Reconstruction [86.1667769209103]
Raw images have distinct advantages over sRGB images, e.g., linearity and fine-grained quantization levels.
They are not widely adopted by general users due to their substantial storage requirements.
We propose a novel framework that learns a compact representation in the latent space, serving as metadata.
arXiv Detail & Related papers (2023-06-21T06:59:07Z) - Controlling Text-to-Image Diffusion by Orthogonal Finetuning [74.21549380288631]
We introduce a principled finetuning method -- Orthogonal Finetuning (OFT) for adapting text-to-image diffusion models to downstream tasks.
Unlike existing methods, OFT can provably preserve hyperspherical energy which characterizes the pairwise neuron relationship on the unit hypersphere.
We empirically show that our OFT framework outperforms existing methods in generation quality and convergence speed.
arXiv Detail & Related papers (2023-06-12T17:59:23Z) - Cascaded Cross-Attention Networks for Data-Efficient Whole-Slide Image
Classification Using Transformers [0.11219061154635457]
Whole-Slide Imaging allows for the capturing and digitization of high-resolution images of histological specimen.
transformer architecture has been proposed as a possible candidate for effectively leveraging the high-resolution information.
We propose a novel cascaded cross-attention network (CCAN) based on the cross-attention mechanism that scales linearly with the number of extracted patches.
arXiv Detail & Related papers (2023-05-11T16:42:24Z) - FewGAN: Generating from the Joint Distribution of a Few Images [95.6635227371479]
We introduce FewGAN, a generative model for generating novel, high-quality and diverse images.
FewGAN is a hierarchical patch-GAN that applies quantization at the first coarse scale, followed by a pyramid of residual fully convolutional GANs at finer scales.
In an extensive set of experiments, it is shown that FewGAN outperforms baselines both quantitatively and qualitatively.
arXiv Detail & Related papers (2022-07-18T07:11:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.