BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution
- URL: http://arxiv.org/abs/2502.00333v2
- Date: Tue, 04 Feb 2025 03:58:30 GMT
- Title: BiMaCoSR: Binary One-Step Diffusion Model Leveraging Flexible Matrix Compression for Real Super-Resolution
- Authors: Kai Liu, Kaicheng Yang, Zheng Chen, Zhiteng Li, Yong Guo, Wenbo Li, Linghe Kong, Yulun Zhang,
- Abstract summary: We propose BiMaCoSR, which combines binarization and one-step distillation to obtain extreme compression and acceleration.<n>BiMaCoSR achieves a 23.8x compression ratio and a 27.4x speedup ratio compared to FP counterpart.
- Score: 63.777210548110425
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While super-resolution (SR) methods based on diffusion models (DM) have demonstrated inspiring performance, their deployment is impeded due to the heavy request of memory and computation. Recent researchers apply two kinds of methods to compress or fasten the DM. One is to compress the DM into 1-bit, aka binarization, alleviating the storage and computation pressure. The other distills the multi-step DM into only one step, significantly speeding up inference process. Nonetheless, it remains impossible to deploy DM to resource-limited edge devices. To address this problem, we propose BiMaCoSR, which combines binarization and one-step distillation to obtain extreme compression and acceleration. To prevent the catastrophic collapse of the model caused by binarization, we proposed sparse matrix branch (SMB) and low rank matrix branch (LRMB). Both auxiliary branches pass the full-precision (FP) information but in different ways. SMB absorbs the extreme values and its output is high rank, carrying abundant FP information. Whereas, the design of LRMB is inspired by LoRA and is initialized with the top r SVD components, outputting low rank representation. The computation and storage overhead of our proposed branches can be safely ignored. Comprehensive comparison experiments are conducted to exhibit BiMaCoSR outperforms current state-of-the-art binarization methods and gains competitive performance compared with FP one-step model. BiMaCoSR achieves a 23.8x compression ratio and a 27.4x speedup ratio compared to FP counterpart. Our code and model are available at https://github.com/Kai-Liu001/BiMaCoSR.
Related papers
- Pangu Embedded: An Efficient Dual-system LLM Reasoner with Metacognition [95.54406667705999]
Pangu Embedded is an efficient Large Language Model (LLM) reasoner developed on Ascend Neural Processing Units (NPUs)<n>It addresses the significant computational costs and inference latency challenges prevalent in existing reasoning-optimized LLMs.<n>It delivers rapid responses and state-of-the-art reasoning quality within a single, unified model architecture.
arXiv Detail & Related papers (2025-05-28T14:03:02Z) - BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook [20.89001326838199]
We present BTC-LLM, a novel sub-1-bit large language model (LLM) quantization framework.<n>Our approach incorporates two key innovations: (1) a Learnable Transformation that optimize invertible scaling and rotation to align binarized weights with full-precision distributions, and (2) a Flash and Accurate Binary Codebook that identifies recurring binary vector clusters.
arXiv Detail & Related papers (2025-05-24T03:57:19Z) - Addition is almost all you need: Compressing neural networks with double binary factorization [0.0]
Double Binary Factorization (DBF) is a novel method that factorizes dense weight matrices into products of two binary (sign) matrices, each accompanied by scaling vectors.<n>DBF preserves the efficiency advantages of binary representations while achieving compression rates that are competitive with or superior to state-of-the-art methods.<n>In a 2-bit per weight range, DBF is competitive with the best quantization methods like QuIP# and QTIP.
arXiv Detail & Related papers (2025-05-16T10:07:36Z) - Progressive Binarization with Semi-Structured Pruning for LLMs [36.32239429974179]
Large language models (LLMs) have achieved remarkable success in natural language processing tasks.
Their high computational and memory demands pose challenges for deployment on resource-constrained devices.
We propose a Progressive Binarization with Semi-Structured Pruning (PBS$2$P) method for LLM compression.
arXiv Detail & Related papers (2025-02-03T13:30:29Z) - BiDM: Pushing the Limit of Quantization for Diffusion Models [60.018246440536814]
This paper proposes a novel method, namely BiDM, for fully binarizing weights and activations of DMs, pushing quantization to the 1-bit limit.
As the first work to fully binarize DMs, the W1A1 BiDM on the LDM-4 model for LSUN-Bedrooms 256$times$256 achieves a remarkable FID of 22.74.
arXiv Detail & Related papers (2024-12-08T12:45:21Z) - ARB-LLM: Alternating Refined Binarizations for Large Language Models [82.24826360906341]
ARB-LLM is a novel 1-bit post-training quantization (PTQ) technique tailored for Large Language Models (LLMs)
As a binary PTQ method, our ARB-LLM$_textRC$ is the first to surpass FP16 models of the same size.
arXiv Detail & Related papers (2024-10-04T03:50:10Z) - Binarized Diffusion Model for Image Super-Resolution [61.963833405167875]
Binarization, an ultra-compression algorithm, offers the potential for effectively accelerating advanced diffusion models (DMs)
Existing binarization methods result in significant performance degradation.
We introduce a novel binarized diffusion model, BI-DiffSR, for image SR.
arXiv Detail & Related papers (2024-06-09T10:30:25Z) - BinaryDM: Accurate Weight Binarization for Efficient Diffusion Models [39.287947829085155]
This paper proposes a novel weight binarization approach for DMs, namely BinaryDM, pushing binarized DMs to be accurate and efficient.<n>From the representation perspective, we present an Evolvable-Basis Binarizer (EBB) to enable a smooth evolution of DMs from full-precision to accurately binarized.<n>Experiments demonstrate that BinaryDM achieves significant accuracy and efficiency gains compared to SOTA quantization methods of DMs under ultra-low bit-widths.
arXiv Detail & Related papers (2024-04-08T16:46:25Z) - Binarized Spectral Compressive Imaging [59.18636040850608]
Existing deep learning models for hyperspectral image (HSI) reconstruction achieve good performance but require powerful hardwares with enormous memory and computational resources.
We propose a novel method, Binarized Spectral-Redistribution Network (BiSRNet)
BiSRNet is derived by using the proposed techniques to binarize the base model.
arXiv Detail & Related papers (2023-05-17T15:36:08Z) - A Generic Network Compression Framework for Sequential Recommender
Systems [71.81962915192022]
Sequential recommender systems (SRS) have become the key technology in capturing user's dynamic interests and generating high-quality recommendations.
We propose a compressed sequential recommendation framework, termed as CpRec, where two generic model shrinking techniques are employed.
By the extensive ablation studies, we demonstrate that the proposed CpRec can achieve up to 4$sim$8 times compression rates in real-world SRS datasets.
arXiv Detail & Related papers (2020-04-21T08:40:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.