Unified Data-Free Compression: Pruning and Quantization without
Fine-Tuning
- URL: http://arxiv.org/abs/2308.07209v1
- Date: Mon, 14 Aug 2023 15:25:07 GMT
- Title: Unified Data-Free Compression: Pruning and Quantization without
Fine-Tuning
- Authors: Shipeng Bai, Jun Chen, Xintian Shen, Yixuan Qian, Yong Liu
- Abstract summary: We propose a novel framework named Unified Data-Free Compression(UDFC), which performs pruning and quantization simultaneously without any data and fine-tuning process.
We evaluate the UDFC on the large-scale image classification task and obtain significant improvements over various network architectures and compression methods.
- Score: 12.982673904306633
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structured pruning and quantization are promising approaches for reducing the
inference time and memory footprint of neural networks. However, most existing
methods require the original training dataset to fine-tune the model. This not
only brings heavy resource consumption but also is not possible for
applications with sensitive or proprietary data due to privacy and security
concerns. Therefore, a few data-free methods are proposed to address this
problem, but they perform data-free pruning and quantization separately, which
does not explore the complementarity of pruning and quantization. In this
paper, we propose a novel framework named Unified Data-Free Compression(UDFC),
which performs pruning and quantization simultaneously without any data and
fine-tuning process. Specifically, UDFC starts with the assumption that the
partial information of a damaged(e.g., pruned or quantized) channel can be
preserved by a linear combination of other channels, and then derives the
reconstruction form from the assumption to restore the information loss due to
compression. Finally, we formulate the reconstruction error between the
original network and its compressed network, and theoretically deduce the
closed-form solution. We evaluate the UDFC on the large-scale image
classification task and obtain significant improvements over various network
architectures and compression methods. For example, we achieve a 20.54%
accuracy improvement on ImageNet dataset compared to SOTA method with 30%
pruning ratio and 6-bit quantization on ResNet-34.
Related papers
- Unleashing Degradation-Carrying Features in Symmetric U-Net: Simpler and Stronger Baselines for All-in-One Image Restoration [52.82397287366076]
All-in-one image restoration aims to handle diverse degradations (e.g., noise, blur, adverse weather) within a unified framework.<n>In this work, we reveal a critical insight: well-crafted feature extraction inherently encodes degradation-carrying information.<n>Our symmetric design preserves intrinsic degradation signals robustly, rendering simple additive fusion in skip connections.
arXiv Detail & Related papers (2025-12-11T12:20:31Z) - Post-Pruning Accuracy Recovery via Data-Free Knowledge Distillation [0.0]
In privacy-sensitive domains such as healthcare or finance, access to the original training data is often restricted post-deployment due to regulations.<n>This paper proposes a Data-Free Knowledge Distillation framework to bridge the gap between model compression and data privacy.
arXiv Detail & Related papers (2025-11-24T18:27:40Z) - Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z) - AutoDFP: Automatic Data-Free Pruning via Channel Similarity
Reconstruction [18.589013910402237]
We propose the Automatic Data-Free Pruning (AutoDFP) method that achieves automatic pruning and reconstruction without fine-tuning.
We evaluate AutoDFP with multiple networks on multiple datasets, achieving impressive compression results.
arXiv Detail & Related papers (2024-03-13T02:56:31Z) - Compression of Structured Data with Autoencoders: Provable Benefit of
Nonlinearities and Depth [83.15263499262824]
We prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
We show how to improve upon Gaussian performance for the compression of sparse data by adding a denoising function to a shallow architecture.
We validate our findings on image datasets, such as CIFAR-10 and MNIST.
arXiv Detail & Related papers (2024-02-07T16:32:29Z) - Dataset Quantization [72.61936019738076]
We present dataset quantization (DQ), a new framework to compress large-scale datasets into small subsets.
DQ is the first method that can successfully distill large-scale datasets such as ImageNet-1k with a state-of-the-art compression ratio.
arXiv Detail & Related papers (2023-08-21T07:24:29Z) - OPQ: Compressing Deep Neural Networks with One-shot Pruning-Quantization [32.60139548889592]
We propose a novel One-shot Pruning-Quantization (OPQ) in this paper.
OPQ analytically solves the compression allocation with pre-trained weight parameters only.
We propose a unified channel-wise quantization method that enforces all channels of each layer to share a common codebook.
arXiv Detail & Related papers (2022-05-23T09:05:25Z) - SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian
Approximation [22.782678826199206]
Quantization of deep neural networks (DNN) has been proven effective for compressing and accelerating models.
Data-free quantization (DFQ) is a promising approach without the original datasets under privacy-sensitive and confidential scenarios.
This paper proposes an on-the-fly DFQ framework with sub-second quantization time, called SQuant, which can quantize networks on inference-only devices.
arXiv Detail & Related papers (2022-02-14T01:57:33Z) - Reducing Redundancy in the Bottleneck Representation of the Autoencoders [98.78384185493624]
Autoencoders are a type of unsupervised neural networks, which can be used to solve various tasks.
We propose a scheme to explicitly penalize feature redundancies in the bottleneck representation.
We tested our approach across different tasks: dimensionality reduction using three different dataset, image compression using the MNIST dataset, and image denoising using fashion MNIST.
arXiv Detail & Related papers (2022-02-09T18:48:02Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Single-path Bit Sharing for Automatic Loss-aware Model Compression [126.98903867768732]
Single-path Bit Sharing (SBS) is able to significantly reduce computational cost while achieving promising performance.
Our SBS compressed MobileNetV2 achieves 22.6x Bit-Operation (BOP) reduction with only 0.1% drop in the Top-1 accuracy.
arXiv Detail & Related papers (2021-01-13T08:28:21Z) - Tensor Reordering for CNN Compression [7.228285747845778]
We show how parameter redundancy in Convolutional Neural Network (CNN) filters can be effectively reduced by pruning in spectral domain.
Our approach is applied to pretrained CNNs and we show that minor additional fine-tuning allows our method to recover the original model performance.
arXiv Detail & Related papers (2020-10-22T23:45:34Z) - UCP: Uniform Channel Pruning for Deep Convolutional Neural Networks
Compression and Acceleration [24.42067007684169]
We propose a novel uniform channel pruning (UCP) method to prune deep CNN.
The unimportant channels, including convolutional kernels related to them, are pruned directly.
We verify our method on CIFAR-10, CIFAR-100 and ILSVRC-2012 for image classification.
arXiv Detail & Related papers (2020-10-03T01:51:06Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.