Related papers: Pruning vs Quantization: Which is Better?

Pruning vs Quantization: Which is Better?

URL: http://arxiv.org/abs/2307.02973v2
Date: Fri, 16 Feb 2024 09:52:58 GMT
Title: Pruning vs Quantization: Which is Better?
Authors: Andrey Kuzmin, Markus Nagel, Mart van Baalen, Arash Behboodi, Tijmen Blankevoort
Abstract summary: We provide an extensive comparison between the two techniques for compressing deep neural networks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.
Score: 25.539649458493614
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Neural network pruning and quantization techniques are almost as old as neural networks themselves. However, to date only ad-hoc comparisons between the two have been published. In this paper, we set out to answer the question on which is better: neural network quantization or pruning? By answering this question, we hope to inform design decisions made on neural network hardware going forward. We provide an extensive comparison between the two techniques for compressing deep neural networks. First, we give an analytical comparison of expected quantization and pruning error for general data distributions. Then, we provide lower bounds for the per-layer pruning and quantization error in trained networks, and compare these to empirical error after optimization. Finally, we provide an extensive experimental comparison for training 8 large-scale models on 3 tasks. Our results show that in most cases quantization outperforms pruning. Only in some scenarios with very high compression ratio, pruning might be beneficial from an accuracy standpoint.

Related papers

Verified Neural Compressed Sensing [58.98637799432153]
We develop the first (to the best of our knowledge) provably correct neural networks for a precise computational task. We show that for modest problem dimensions (up to 50), we can train neural networks that provably recover a sparse vector from linear and binarized linear measurements. We show that the complexity of the network can be adapted to the problem difficulty and solve problems where traditional compressed sensing methods are not known to provably work.
arXiv Detail & Related papers (2024-05-07T12:20:12Z)
Quantifying lottery tickets under label noise: accuracy, calibration, and complexity [6.232071870655069]
Pruning deep neural networks is a widely used strategy to alleviate the computational burden in machine learning. We use the sparse double descent approach to identify univocally and characterise pruned models associated with classification tasks.
arXiv Detail & Related papers (2023-06-21T11:35:59Z)
Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning [58.34310957892895]
The state of neural network pruning has been noticed to be unclear and even confusing for a while. We first clarify the fairness principle in pruning experiments and summarize the widely-used comparison setups. We then point out the central role of network trainability, which has not been well recognized so far.
arXiv Detail & Related papers (2023-01-12T18:58:33Z)
The smooth output assumption, and why deep networks are better than wide ones [0.0]
We propose a new measure that predicts how well a model will generalize. It is based on the fact that, in reality, boundaries between concepts are generally unsharp.
arXiv Detail & Related papers (2022-11-25T19:05:44Z)
On the Neural Tangent Kernel Analysis of Randomly Pruned Neural Networks [91.3755431537592]
We study how random pruning of the weights affects a neural network's neural kernel (NTK) In particular, this work establishes an equivalence of the NTKs between a fully-connected neural network and its randomly pruned version.
arXiv Detail & Related papers (2022-03-27T15:22:19Z)
Post-training Quantization for Neural Networks with Provable Guarantees [9.58246628652846]
We modify a post-training neural-network quantization method, GPFQ, that is based on a greedy path-following mechanism. We prove that for quantizing a single-layer network, the relative square error essentially decays linearly in the number of weights.
arXiv Detail & Related papers (2022-01-26T18:47:38Z)
Why Lottery Ticket Wins? A Theoretical Perspective of Sample Complexity on Pruned Neural Networks [79.74580058178594]
We analyze the performance of training a pruned neural network by analyzing the geometric structure of the objective function. We show that the convex region near a desirable model with guaranteed generalization enlarges as the neural network model is pruned.
arXiv Detail & Related papers (2021-10-12T01:11:07Z)
Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy [42.15969584135412]
Neural network pruning is a popular technique used to reduce the inference costs of modern networks. We evaluate whether the use of test accuracy alone in the terminating condition is sufficient to ensure that the resulting model performs well. We find that pruned networks effectively approximate the unpruned model, however, the prune ratio at which pruned networks achieve commensurate performance varies significantly across tasks.
arXiv Detail & Related papers (2021-03-04T13:22:16Z)
Ps and Qs: Quantization-aware pruning for efficient low latency neural network inference [56.24109486973292]
We study the interplay between pruning and quantization during the training of neural networks for ultra low latency applications. We find that quantization-aware pruning yields more computationally efficient models than either pruning or quantization alone for our task.
arXiv Detail & Related papers (2021-02-22T19:00:05Z)
A Partial Regularization Method for Network Compression [0.0]
We propose an approach of partial regularization rather than the original form of penalizing all parameters, which is said to be full regularization, to conduct model compression at a higher speed. Experimental results show that as we expected, the computational complexity is reduced by observing less running time in almost all situations. Surprisingly, it helps to improve some important metrics such as regression fitting results and classification accuracy in both training and test phases on multiple datasets.
arXiv Detail & Related papers (2020-09-03T00:38:27Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.