Compressing Neural Networks: Towards Determining the Optimal Layer-wise
Decomposition
- URL: http://arxiv.org/abs/2107.11442v1
- Date: Fri, 23 Jul 2021 20:01:30 GMT
- Title: Compressing Neural Networks: Towards Determining the Optimal Layer-wise
Decomposition
- Authors: Lucas Liebenwein, Alaa Maalouf, Oren Gal, Dan Feldman, Daniela Rus
- Abstract summary: We present a novel global compression framework for deep neural networks.
It automatically analyzes each layer to identify the optimal per-layer compression ratio.
Our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks.
- Score: 62.41259783906452
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a novel global compression framework for deep neural networks that
automatically analyzes each layer to identify the optimal per-layer compression
ratio, while simultaneously achieving the desired overall compression. Our
algorithm hinges on the idea of compressing each convolutional (or
fully-connected) layer by slicing its channels into multiple groups and
decomposing each group via low-rank decomposition. At the core of our algorithm
is the derivation of layer-wise error bounds from the Eckart Young Mirsky
theorem. We then leverage these bounds to frame the compression problem as an
optimization problem where we wish to minimize the maximum compression error
across layers and propose an efficient algorithm towards a solution. Our
experiments indicate that our method outperforms existing low-rank compression
approaches across a wide range of networks and data sets. We believe that our
results open up new avenues for future research into the global
performance-size trade-offs of modern neural networks. Our code is available at
https://github.com/lucaslie/torchprune.
Related papers
- LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging [20.774060844559838]
Existing depth compression methods remove redundant non-linear activation functions and merge the consecutive convolution layers into a single layer.
These methods suffer from a critical drawback; the kernel size of the merged layers becomes larger.
We show that this problem can be addressed by jointly pruning convolution layers and activation functions.
We propose LayerMerge, a novel depth compression method that selects which activation layers and convolution layers to remove.
arXiv Detail & Related papers (2024-06-18T17:55:15Z) - Network Pruning via Feature Shift Minimization [8.593369249204132]
We propose a novel Feature Shift Minimization (FSM) method to compress CNN models, which evaluates the feature shift by converging the information of both features and filters.
The proposed method yields state-of-the-art performance on various benchmark networks and datasets, verified by extensive experiments.
arXiv Detail & Related papers (2022-07-06T12:50:26Z) - A Theoretical Understanding of Neural Network Compression from Sparse
Linear Approximation [37.525277809849776]
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance.
We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression.
We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
arXiv Detail & Related papers (2022-06-11T20:10:35Z) - Low-rank Tensor Decomposition for Compression of Convolutional Neural
Networks Using Funnel Regularization [1.8579693774597708]
We propose a model reduction method to compress the pre-trained networks using low-rank tensor decomposition.
A new regularization method, called funnel function, is proposed to suppress the unimportant factors during the compression.
For ResNet18 with ImageNet2012, our reduced model can reach more than twi times speed up in terms of GMAC with merely 0.7% Top-1 accuracy drop.
arXiv Detail & Related papers (2021-12-07T13:41:51Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z) - Unfolding Neural Networks for Compressive Multichannel Blind
Deconvolution [71.29848468762789]
We propose a learned-structured unfolding neural network for the problem of compressive sparse multichannel blind-deconvolution.
In this problem, each channel's measurements are given as convolution of a common source signal and sparse filter.
We demonstrate that our method is superior to classical structured compressive sparse multichannel blind-deconvolution methods in terms of accuracy and speed of sparse filter recovery.
arXiv Detail & Related papers (2020-10-22T02:34:33Z) - PowerGossip: Practical Low-Rank Communication Compression in
Decentralized Deep Learning [62.440827696638664]
We introduce a simple algorithm that directly compresses the model differences between neighboring workers.
Inspired by the PowerSGD for centralized deep learning, this algorithm uses power steps to maximize the information transferred per bit.
arXiv Detail & Related papers (2020-08-04T09:14:52Z) - Structured Sparsification with Joint Optimization of Group Convolution
and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression.
The proposed method automatically induces structured sparsity on the convolutional weights.
We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.