PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models
- URL: http://arxiv.org/abs/2409.13894v1
- Date: Fri, 20 Sep 2024 20:52:56 GMT
- Title: PTQ4ADM: Post-Training Quantization for Efficient Text Conditional Audio Diffusion Models
- Authors: Jayneel Vora, Aditya Krishnan, Nader Bouacida, Prabhu RV Shankar, Prasant Mohapatra,
- Abstract summary: This work introduces PTQ4ADM, a novel framework for quantizing audio diffusion models (ADMs)
Our key contributions include (1) a coverage-driven prompt augmentation method and (2) an activation-aware calibration set generation algorithm for text-conditional ADMs.
Extensive experiments demonstrate PTQ4ADM's capability to reduce the model size by up to 70% while achieving synthesis quality metrics comparable to full-precision models.
- Score: 8.99127212785609
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Denoising diffusion models have emerged as state-of-the-art in generative tasks across image, audio, and video domains, producing high-quality, diverse, and contextually relevant data. However, their broader adoption is limited by high computational costs and large memory footprints. Post-training quantization (PTQ) offers a promising approach to mitigate these challenges by reducing model complexity through low-bandwidth parameters. Yet, direct application of PTQ to diffusion models can degrade synthesis quality due to accumulated quantization noise across multiple denoising steps, particularly in conditional tasks like text-to-audio synthesis. This work introduces PTQ4ADM, a novel framework for quantizing audio diffusion models(ADMs). Our key contributions include (1) a coverage-driven prompt augmentation method and (2) an activation-aware calibration set generation algorithm for text-conditional ADMs. These techniques ensure comprehensive coverage of audio aspects and modalities while preserving synthesis fidelity. We validate our approach on TANGO, Make-An-Audio, and AudioLDM models for text-conditional audio generation. Extensive experiments demonstrate PTQ4ADM's capability to reduce the model size by up to 70\% while achieving synthesis quality metrics comparable to full-precision models($<$5\% increase in FD scores). We show that specific layers in the backbone network can be quantized to 4-bit weights and 8-bit activations without significant quality loss. This work paves the way for more efficient deployment of ADMs in resource-constrained environments.
Related papers
- PassionSR: Post-Training Quantization with Adaptive Scale in One-Step Diffusion based Image Super-Resolution [87.89013794655207]
Diffusion-based image super-resolution (SR) models have shown superior performance at the cost of multiple denoising steps.
We propose a novel post-training quantization approach with adaptive scale in one-step diffusion (OSD) image SR, PassionSR.
Our PassionSR achieves significant advantages over recent leading low-bit quantization methods for image SR.
arXiv Detail & Related papers (2024-11-26T04:49:42Z) - Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data [69.7174072745851]
We present Synthio, a novel approach for augmenting small-scale audio classification datasets with synthetic data.
To overcome the first challenge, we align the generations of the T2A model with the small-scale dataset using preference optimization.
To address the second challenge, we propose a novel caption generation technique that leverages the reasoning capabilities of Large Language Models.
arXiv Detail & Related papers (2024-10-02T22:05:36Z) - QNCD: Quantization Noise Correction for Diffusion Models [15.189069680672239]
Diffusion models have revolutionized image synthesis, setting new benchmarks in quality and creativity.
Post-training quantization presents a solution to accelerate sampling, aibeit at the expense of sample quality.
We introduce a unified Quantization Noise Correction Scheme (QNCD) aimed at minishing quantization noise throughout the sampling process.
arXiv Detail & Related papers (2024-03-28T04:24:56Z) - TFMQ-DM: Temporal Feature Maintenance Quantization for Diffusion Models [52.454274602380124]
Diffusion models heavily depend on the time-step $t$ to achieve satisfactory multi-round denoising.
We propose a Temporal Feature Maintenance Quantization (TFMQ) framework building upon a Temporal Information Block.
Powered by the pioneering block design, we devise temporal information aware reconstruction (TIAR) and finite set calibration (FSC) to align the full-precision temporal features.
arXiv Detail & Related papers (2023-11-27T12:59:52Z) - EfficientDM: Efficient Quantization-Aware Fine-Tuning of Low-Bit Diffusion Models [21.17675493267517]
Post-training quantization (PTQ) and quantization-aware training (QAT) are two main approaches to compress and accelerate diffusion models.
We introduce a data-free and parameter-efficient fine-tuning framework for low-bit diffusion models, dubbed EfficientDM, to achieve QAT-level performance with PTQ-like efficiency.
Our method significantly outperforms previous PTQ-based diffusion models while maintaining similar time and data efficiency.
arXiv Detail & Related papers (2023-10-05T02:51:53Z) - Enhancing Quantised End-to-End ASR Models via Personalisation [12.971231464928806]
We propose a novel strategy of personalisation for a quantised model (PQM)
PQM uses a 4-bit NormalFloat Quantisation (NF4) approach for model quantisation and low-rank adaptation (LoRA) for SAT.
Experiments have been performed on the LibriSpeech and the TED-LIUM 3 corpora.
arXiv Detail & Related papers (2023-09-17T02:35:21Z) - From Discrete Tokens to High-Fidelity Audio Using Multi-Band Diffusion [84.138804145918]
Deep generative models can generate high-fidelity audio conditioned on various types of representations.
These models are prone to generate audible artifacts when the conditioning is flawed or imperfect.
We propose a high-fidelity multi-band diffusion-based framework that generates any type of audio modality from low-bitrate discrete representations.
arXiv Detail & Related papers (2023-08-02T22:14:29Z) - Q-Diffusion: Quantizing Diffusion Models [52.978047249670276]
Post-training quantization (PTQ) is considered a go-to compression method for other tasks.
We propose a novel PTQ method specifically tailored towards the unique multi-timestep pipeline and model architecture.
We show that our proposed method is able to quantize full-precision unconditional diffusion models into 4-bit while maintaining comparable performance.
arXiv Detail & Related papers (2023-02-08T19:38:59Z) - Adversarial Audio Synthesis with Complex-valued Polynomial Networks [60.231877895663956]
Time-frequency (TF) representations in audio have been increasingly modeled real-valued networks.
We introduce complex-valued networks called APOLLO, that integrate such complex-valued representations in a natural way.
APOLLO results in $17.5%$ improvement over adversarial methods and $8.2%$ over the state-of-the-art diffusion models on SC09 in audio generation.
arXiv Detail & Related papers (2022-06-14T12:58:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.