Towards Hardware-Specific Automatic Compression of Neural Networks
- URL: http://arxiv.org/abs/2212.07818v1
- Date: Thu, 15 Dec 2022 13:34:02 GMT
- Title: Towards Hardware-Specific Automatic Compression of Neural Networks
- Authors: Torben Krieger, Bernhard Klein, Holger Fr\"oning
- Abstract summary: pruning and quantization are the major approaches to compress neural networks nowadays.
Effective compression policies consider the influence of the specific hardware architecture on the used compression methods.
We propose an algorithmic framework called Galen to search such policies using reinforcement learning utilizing pruning and quantization.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Compressing neural network architectures is important to allow the deployment
of models to embedded or mobile devices, and pruning and quantization are the
major approaches to compress neural networks nowadays. Both methods benefit
when compression parameters are selected specifically for each layer. Finding
good combinations of compression parameters, so-called compression policies, is
hard as the problem spans an exponentially large search space. Effective
compression policies consider the influence of the specific hardware
architecture on the used compression methods. We propose an algorithmic
framework called Galen to search such policies using reinforcement learning
utilizing pruning and quantization, thus providing automatic compression for
neural networks. Contrary to other approaches we use inference latency measured
on the target hardware device as an optimization goal. With that, the framework
supports the compression of models specific to a given hardware target. We
validate our approach using three different reinforcement learning agents for
pruning, quantization and joint pruning and quantization. Besides proving the
functionality of our approach we were able to compress a ResNet18 for CIFAR-10,
on an embedded ARM processor, to 20% of the original inference latency without
significant loss of accuracy. Moreover, we can demonstrate that a joint search
and compression using pruning and quantization is superior to an individual
search for policies using a single compression method.
Related papers
- Order of Compression: A Systematic and Optimal Sequence to Combinationally Compress CNN [5.25545980258284]
We propose a systematic and optimal sequence to apply multiple compression techniques in the most effective order.
Our proposed Order of Compression significantly reduces computational costs by up to 859 times on ResNet34, with negligible accuracy loss.
We believe our simple yet effective exploration of the order of compression will shed light on the practice of model compression.
arXiv Detail & Related papers (2024-03-26T07:26:00Z) - Learning Accurate Performance Predictors for Ultrafast Automated Model
Compression [86.22294249097203]
We propose an ultrafast automated model compression framework called SeerNet for flexible network deployment.
Our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
arXiv Detail & Related papers (2023-04-13T10:52:49Z) - Towards Optimal Compression: Joint Pruning and Quantization [1.191194620421783]
This paper introduces FITCompress, a novel method integrating layer-wise mixed-precision quantization and unstructured pruning.
Experiments on computer vision and natural language processing benchmarks demonstrate that our proposed approach achieves a superior compression-performance trade-off.
arXiv Detail & Related papers (2023-02-15T12:02:30Z) - A Theoretical Understanding of Neural Network Compression from Sparse
Linear Approximation [37.525277809849776]
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance.
We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression.
We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
arXiv Detail & Related papers (2022-06-11T20:10:35Z) - An Information Theory-inspired Strategy for Automatic Network Pruning [88.51235160841377]
Deep convolution neural networks are well known to be compressed on devices with resource constraints.
Most existing network pruning methods require laborious human efforts and prohibitive computation resources.
We propose an information theory-inspired strategy for automatic model compression.
arXiv Detail & Related papers (2021-08-19T07:03:22Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Neural Network Compression Via Sparse Optimization [23.184290795230897]
We propose a model compression framework based on the recent progress on sparse optimization.
We achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation of accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet.
arXiv Detail & Related papers (2020-11-10T03:03:55Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z) - Structured Sparsification with Joint Optimization of Group Convolution
and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression.
The proposed method automatically induces structured sparsity on the convolutional weights.
We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.