Related papers: BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models

BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models

URL: http://arxiv.org/abs/2404.05662v5
Date: Fri, 31 Jan 2025 07:11:43 GMT
Title: BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models
Authors: Xingyu Zheng, Xianglong Liu, Haotong Qin, Xudong Ma, Mingyuan Zhang, Haojie Hao, Jiakai Wang, Zixiang Zhao, Jinyang Guo, Michele Magno,
Abstract summary: This paper proposes a novel weight binarization approach for DMs, namely BinaryDM, pushing binarized DMs to be accurate and efficient.<n>From the representation perspective, we present an Evolvable-Basis Binarizer (EBB) to enable a smooth evolution of DMs from full-precision to accurately binarized.<n>Experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths.
Score: 39.287947829085155
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the advancement of diffusion models (DMs) and the substantially increased computational requirements, quantization emerges as a practical solution to obtain compact and efficient low-bit DMs. However, the highly discrete representation leads to severe accuracy degradation, hindering the quantization of diffusion models to ultra-low bit-widths. This paper proposes a novel weight binarization approach for DMs, namely BinaryDM, pushing binarized DMs to be accurate and efficient by improving the representation and optimization. From the representation perspective, we present an Evolvable-Basis Binarizer (EBB) to enable a smooth evolution of DMs from full-precision to accurately binarized. EBB enhances information representation in the initial stage through the flexible combination of multiple binary bases and applies regularization to evolve into efficient single-basis binarization. The evolution only occurs in the head and tail of the DM architecture to retain the stability of training. From the optimization perspective, a Low-rank Representation Mimicking (LRM) is applied to assist the optimization of binarized DMs. The LRM mimics the representations of full-precision DMs in low-rank space, alleviating the direction ambiguity of the optimization process caused by fine-grained alignment. Comprehensive experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths. With 1-bit weight and 4-bit activation (W1A4), BinaryDM achieves as low as 7.74 FID and saves the performance from collapse (baseline FID 10.87). As the first binarization method for diffusion models, W1A4 BinaryDM achieves impressive 15.2x OPs and 29.2x model size savings, showcasing its substantial potential for edge deployment. The code is available at https://github.com/Xingyu-Zheng/BinaryDM.

Related papers

BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution [63.777210548110425]
We propose BiMaCoSR, which combines binarization and one-step distillation to obtain extreme compression and acceleration. BiMaCoSR achieves a 23.8x compression ratio and a 27.4x speedup ratio compared to FP counterpart.
arXiv Detail & Related papers (2025-02-01T06:34:55Z)
BiDM: Pushing the Limit of Quantization for Diffusion Models [60.018246440536814]
This paper proposes a novel method, namely BiDM, for fully binarizing weights and activations of DMs, pushing quantization to the 1-bit limit. As the first work to fully binarize DMs, the W1A1 BiDM on the LDM-4 model for LSUN-Bedrooms 256$times$256 achieves a remarkable FID of 22.74.
arXiv Detail & Related papers (2024-12-08T12:45:21Z)
BiDense: Binarization for Dense Prediction [62.70804353158387]
BiDense is a generalized binary neural network (BNN) designed for efficient and accurate dense prediction tasks. BiDense incorporates two key techniques: the Distribution-adaptive Binarizer (DAB) and the Channel-adaptive Full-precision Bypass (CFB)
arXiv Detail & Related papers (2024-11-15T16:46:04Z)
Pruning then Reweighting: Towards Data-Efficient Training of Diffusion Models [33.09663675904689]
We investigate efficient diffusion training from the perspective of dataset pruning. Inspired by the principles of data-efficient training for generative models such as generative adversarial networks (GANs), we first extend the data selection scheme used in GANs to DM training. To further improve the generation performance, we employ a class-wise reweighting approach.
arXiv Detail & Related papers (2024-09-27T20:21:19Z)
DKDM: Data-Free Knowledge Distillation for Diffusion Models with Any Architecture [69.58440626023541]
Diffusion models (DMs) have demonstrated exceptional generative capabilities across various areas. The most common way to accelerate DMs involves reducing the number of denoising steps during generation. We propose a novel method that transfers the capability of large pretrained DMs to faster architectures.
arXiv Detail & Related papers (2024-09-05T14:12:22Z)
Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs) Existing binarization methods result in significant performance degradation. We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z)
Invertible Diffusion Models for Compressed Sensing [22.293412255419614]
Invertible Diffusion Models (IDM) is a novel efficient, end-to-end diffusion-based CS method. IDM finetunes it end-to-end to recover original images directly from CS measurements. Our IDM achieves up to 10.09dB PSNR gain and 14.54 times faster inference.
arXiv Detail & Related papers (2024-03-25T17:59:41Z)
DB-LLM: Accurate Dual-Binarization for Efficient LLMs [83.70686728471547]
Large language models (LLMs) have significantly advanced the field of natural language processing. Existing ultra-low-bit quantization always causes severe accuracy drops. We propose a novel Dual-Binarization method for LLMs, namely DB-LLM.
arXiv Detail & Related papers (2024-02-19T09:04:30Z)
Distribution-sensitive Information Retention for Accurate Binary Neural Network [49.971345958676196]
We present a novel Distribution-sensitive Information Retention Network (DIR-Net) to retain the information of the forward activations and backward gradients. Our DIR-Net consistently outperforms the SOTA binarization approaches under mainstream and compact architectures. We conduct our DIR-Net on real-world resource-limited devices which achieves 11.1 times storage saving and 5.4 times speedup.
arXiv Detail & Related papers (2021-09-25T10:59:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.