TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models
- URL: http://arxiv.org/abs/2404.09532v1
- Date: Mon, 15 Apr 2024 07:51:40 GMT
- Title: TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models
- Authors: Haojun Sun, Chen Tang, Zhi Wang, Yuan Meng, Jingyan jiang, Xinzhu Ma, Wenwu Zhu,
- Abstract summary: We introduce TMPQ-DM, which jointly optimize timestep reduction and quantization to achieve a superior performance-efficiency trade-off.
For timestep reduction, we devise a non-uniform grouping scheme tailored to the non-uniform nature of the denoising process.
In terms of quantization, we adopt a fine-grained layer-wise approach to allocate varying bit-widths to different layers based on their respective contributions to the final generative performance.
- Score: 40.5153344875351
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Diffusion models have emerged as preeminent contenders in the realm of generative models. Distinguished by their distinctive sequential generative processes, characterized by hundreds or even thousands of timesteps, diffusion models progressively reconstruct images from pure Gaussian noise, with each timestep necessitating full inference of the entire model. However, the substantial computational demands inherent to these models present challenges for deployment, quantization is thus widely used to lower the bit-width for reducing the storage and computing overheads. Current quantization methodologies primarily focus on model-side optimization, disregarding the temporal dimension, such as the length of the timestep sequence, thereby allowing redundant timesteps to continue consuming computational resources, leaving substantial scope for accelerating the generative process. In this paper, we introduce TMPQ-DM, which jointly optimizes timestep reduction and quantization to achieve a superior performance-efficiency trade-off, addressing both temporal and model optimization aspects. For timestep reduction, we devise a non-uniform grouping scheme tailored to the non-uniform nature of the denoising process, thereby mitigating the explosive combinations of timesteps. In terms of quantization, we adopt a fine-grained layer-wise approach to allocate varying bit-widths to different layers based on their respective contributions to the final generative performance, thus rectifying performance degradation observed in prior studies. To expedite the evaluation of fine-grained quantization, we further devise a super-network to serve as a precision solver by leveraging shared quantization results. These two design components are seamlessly integrated within our framework, enabling rapid joint exploration of the exponentially large decision space via a gradient-free evolutionary search algorithm.
Related papers
- Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training [4.760537994346813]
As data distributions grow more complex, training diffusion models to convergence becomes increasingly intensive.
We introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps.
Our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures.
arXiv Detail & Related papers (2024-11-15T07:12:18Z) - Q-VLM: Post-training Quantization for Large Vision-Language Models [73.19871905102545]
We propose a post-training quantization framework of large vision-language models (LVLMs) for efficient multi-modal inference.
We mine the cross-layer dependency that significantly influences discretization errors of the entire vision-language model, and embed this dependency into optimal quantization strategy.
Experimental results demonstrate that our method compresses the memory by 2.78x and increase generate speed by 1.44x about 13B LLaVA model without performance degradation.
arXiv Detail & Related papers (2024-10-10T17:02:48Z) - Temporal Feature Matters: A Framework for Diffusion Model Quantization [105.3033493564844]
Diffusion models rely on the time-step for the multi-round denoising.
We introduce a novel quantization framework that includes three strategies.
This framework preserves most of the temporal information and ensures high-quality end-to-end generation.
arXiv Detail & Related papers (2024-07-28T17:46:15Z) - Timestep-Aware Correction for Quantized Diffusion Models [28.265582848911574]
We propose a timestep-aware correction method for quantized diffusion model, which dynamically corrects the quantization error.
By leveraging the proposed method in low-precision diffusion models, substantial enhancement of output quality could be achieved with only negligible overhead.
arXiv Detail & Related papers (2024-07-04T13:22:31Z) - TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models [52.454274602380124]
Diffusion models heavily depend on the time-step $t$ to achieve satisfactory multi-round denoising.
We propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block.
Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the full-precision temporal features.
arXiv Detail & Related papers (2023-11-27T12:59:52Z) - AdaDiff: Accelerating Diffusion Models through Step-Wise Adaptive Computation [32.74923906921339]
Diffusion models achieve great success in generating diverse and high-fidelity images, yet their widespread application is hampered by their inherently slow generation speed.
We propose AdaDiff, an adaptive framework that dynamically allocates computation resources in each sampling step to improve the generation efficiency of diffusion models.
arXiv Detail & Related papers (2023-09-29T09:10:04Z) - Towards Accurate Post-training Quantization for Diffusion Models [73.19871905102545]
We propose an accurate data-free post-training quantization framework of diffusion models (ADP-DM) for efficient image generation.
Our method outperforms the state-of-the-art post-training quantization of diffusion model by a sizable margin with similar computational cost.
arXiv Detail & Related papers (2023-05-30T04:00:35Z) - Conditional Denoising Diffusion for Sequential Recommendation [62.127862728308045]
Two prominent generative models, Generative Adversarial Networks (GANs) and Variational AutoEncoders (VAEs)
GANs suffer from unstable optimization, while VAEs are prone to posterior collapse and over-smoothed generations.
We present a conditional denoising diffusion model, which includes a sequence encoder, a cross-attentive denoising decoder, and a step-wise diffuser.
arXiv Detail & Related papers (2023-04-22T15:32:59Z) - Latent Autoregressive Source Separation [5.871054749661012]
This paper introduces vector-quantized Latent Autoregressive Source Separation (i.e., de-mixing an input signal into its constituent sources) without requiring additional gradient-based optimization or modifications of existing models.
Our separation method relies on the Bayesian formulation in which the autoregressive models are the priors, and a discrete (non-parametric) likelihood function is constructed by performing frequency counts over latent sums of addend tokens.
arXiv Detail & Related papers (2023-01-09T17:32:00Z) - Accelerating Score-based Generative Models with Preconditioned Diffusion
Sampling [36.02321871608158]
We propose a model-agnostic preconditioned diffusion sampling (PDS) method that leverages matrix preconditioning to alleviate the problem.
PDS consistently accelerates off-the-shelf SGMs whilst maintaining the synthesis quality.
In particular, PDS can accelerate by up to 29x on more challenging high resolution (1024x1024) image generation.
arXiv Detail & Related papers (2022-07-05T17:55:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.