Compressing Deep Convolutional Neural Networks by Stacking
Low-dimensional Binary Convolution Filters
- URL: http://arxiv.org/abs/2010.02778v1
- Date: Tue, 6 Oct 2020 14:49:22 GMT
- Title: Compressing Deep Convolutional Neural Networks by Stacking
Low-dimensional Binary Convolution Filters
- Authors: Weichao Lan, Liang Lan
- Abstract summary: Deep Convolutional Neural Networks (CNN) have been successfully applied to many real-life problems.
Huge memory cost of deep CNN models poses a great challenge of deploying them on memory-constrained devices.
We propose a novel method to compress deep CNN model by stacking low-dimensional binary convolution filters.
- Score: 15.66437882635872
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep Convolutional Neural Networks (CNN) have been successfully applied to
many real-life problems. However, the huge memory cost of deep CNN models poses
a great challenge of deploying them on memory-constrained devices (e.g., mobile
phones). One popular way to reduce the memory cost of deep CNN model is to
train binary CNN where the weights in convolution filters are either 1 or -1
and therefore each weight can be efficiently stored using a single bit.
However, the compression ratio of existing binary CNN models is upper bounded
by around 32. To address this limitation, we propose a novel method to compress
deep CNN model by stacking low-dimensional binary convolution filters. Our
proposed method approximates a standard convolution filter by selecting and
stacking filters from a set of low-dimensional binary convolution filters. This
set of low-dimensional binary convolution filters is shared across all filters
for a given convolution layer. Therefore, our method will achieve much larger
compression ratio than binary CNN models. In order to train our proposed model,
we have theoretically shown that our proposed model is equivalent to select and
stack intermediate feature maps generated by low-dimensional binary filters.
Therefore, our proposed model can be efficiently trained using the
split-transform-merge strategy. We also provide detailed analysis of the memory
and computation cost of our model in model inference. We compared the proposed
method with other five popular model compression techniques on two benchmark
datasets. Our experimental results have demonstrated that our proposed method
achieves much higher compression ratio than existing methods while maintains
comparable accuracy.
Related papers
- Memory-efficient particle filter recurrent neural network for object
localization [53.68402839500528]
This study proposes a novel memory-efficient recurrent neural network (RNN) architecture specified to solve the object localization problem.
We take the idea of the classical particle filter and combine it with GRU RNN architecture.
In our experiments, the mePFRNN model provides more precise localization than the considered competitors and requires fewer trained parameters.
arXiv Detail & Related papers (2023-10-02T19:41:19Z) - Approximating Continuous Convolutions for Deep Network Compression [11.566258236184964]
We present ApproxConv, a novel method for compressing the layers of a convolutional neural network.
We show that our method is able to compress existing deep network models by half whilst losing only 1.86% accuracy.
arXiv Detail & Related papers (2022-10-17T11:41:26Z) - Compressing Deep CNNs using Basis Representation and Spectral
Fine-tuning [2.578242050187029]
We propose an efficient and straightforward method for compressing deep convolutional neural networks (CNNs)
Specifically, any spatial convolution layer of the CNN can be replaced by two successive convolution layers.
We fine-tune both the basis and the filter representation to directly mitigate any performance loss due to the truncation.
arXiv Detail & Related papers (2021-05-21T16:14:26Z) - Decoupled Dynamic Filter Networks [85.38058820176047]
We propose the Decoupled Dynamic Filter (DDF) that can simultaneously tackle both of these shortcomings.
Inspired by recent advances in attention, DDF decouples a depth-wise dynamic filter into spatial and channel dynamic filters.
We observe a significant boost in performance when replacing standard convolution with DDF in classification networks.
arXiv Detail & Related papers (2021-04-29T04:55:33Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z) - Binarization Methods for Motor-Imagery Brain-Computer Interface
Classification [18.722731794073756]
We propose methods for transforming real-valued weights to binary numbers for efficient inference.
By tuning the dimension of the binary embedding, we achieve almost the same accuracy in 4-class MI ($leq$1.27% lower) compared to models with float16 weights.
Our method replaces the fully connected layer of CNNs with a binary augmented memory using bipolar random projection.
arXiv Detail & Related papers (2020-10-14T12:28:18Z) - ACDC: Weight Sharing in Atom-Coefficient Decomposed Convolution [57.635467829558664]
We introduce a structural regularization across convolutional kernels in a CNN.
We show that CNNs now maintain performance with dramatic reduction in parameters and computations.
arXiv Detail & Related papers (2020-09-04T20:41:47Z) - Learning Sparse Filters in Deep Convolutional Neural Networks with a
l1/l2 Pseudo-Norm [5.3791844634527495]
Deep neural networks (DNNs) have proven to be efficient for numerous tasks, but come at a high memory and computation cost.
Recent research has shown that their structure can be more compact without compromising their performance.
We present a sparsity-inducing regularization term based on the ratio l1/l2 pseudo-norm defined on the filter coefficients.
arXiv Detail & Related papers (2020-07-20T11:56:12Z) - Cross-filter compression for CNN inference acceleration [4.324080238456531]
We propose a new cross-filter compression method that can provide $sim32times$ memory savings and $122times$ speed up in convolution operations.
Our method, based on Binary-Weight and XNOR-Net separately, is evaluated on CIFAR-10 and ImageNet dataset.
arXiv Detail & Related papers (2020-05-18T19:06:14Z) - Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet.
Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs)
Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.