Convolutional neural networks compression with low rank and sparse
tensor decompositions
- URL: http://arxiv.org/abs/2006.06443v1
- Date: Thu, 11 Jun 2020 13:53:18 GMT
- Title: Convolutional neural networks compression with low rank and sparse
tensor decompositions
- Authors: Pavel Kaloshin
- Abstract summary: Convolutional neural networks show outstanding results in a variety of computer vision tasks.
For some real-world applications, it is crucial to develop models, which can be fast and light enough to run on edge systems and mobile devices.
In this work, we consider a neural network compression method based on tensor decompositions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Convolutional neural networks show outstanding results in a variety of
computer vision tasks. However, a neural network architecture design usually
faces a trade-off between model performance and computational/memory
complexity. For some real-world applications, it is crucial to develop models,
which can be fast and light enough to run on edge systems and mobile devices.
However, many modern architectures that demonstrate good performance don't
satisfy inference time and storage limitation requirements. Thus, arises a
problem of neural network compression to obtain a smaller and faster model,
which is on par with the initial one.
In this work, we consider a neural network compression method based on tensor
decompositions. Namely, we propose to approximate the convolutional layer
weight with a tensor, which can be represented as a sum of low-rank and sparse
components. The motivation for such approximation is based on the assumption
that low-rank and sparse terms allow eliminating two different types of
redundancy and thus yield a better compression rate. An efficient CPU
implementation for the proposed method has been developed. Our algorithm has
demonstrated up to 3.5x CPU layer speedup and 11x layer size reduction when
compressing Resnet50 architecture for the image classification task.
Related papers
- Tiled Bit Networks: Sub-Bit Neural Network Compression Through Reuse of Learnable Binary Vectors [4.95475852994362]
We propose a new form of quantization to tile neural network layers with sequences of bits to achieve sub-bit compression of binary-weighted neural networks.
We employ the approach to both fully-connected and convolutional layers, which make up the breadth of space in most neural architectures.
arXiv Detail & Related papers (2024-07-16T15:55:38Z) - A priori compression of convolutional neural networks for wave
simulators [0.0]
The present neural network designs include millions of parameters, which makes it difficult to install such complex models on devices that have limited memory.
We propose a compressed tensor format of convolutional layer, a priori, before the training of the neural network.
We show that the proposed convolutinal compression technique achieves equivalent performance as classical convolutional layers with fewer trainable parameters and lower memory footprint.
arXiv Detail & Related papers (2023-04-11T04:18:59Z) - Variable Bitrate Neural Fields [75.24672452527795]
We present a dictionary method for compressing feature grids, reducing their memory consumption by up to 100x.
We formulate the dictionary optimization as a vector-quantized auto-decoder problem which lets us learn end-to-end discrete neural representations in a space where no direct supervision is available.
arXiv Detail & Related papers (2022-06-15T17:58:34Z) - A Theoretical Understanding of Neural Network Compression from Sparse
Linear Approximation [37.525277809849776]
The goal of model compression is to reduce the size of a large neural network while retaining a comparable performance.
We use sparsity-sensitive $ell_q$-norm to characterize compressibility and provide a relationship between soft sparsity of the weights in the network and the degree of compression.
We also develop adaptive algorithms for pruning each neuron in the network informed by our theory.
arXiv Detail & Related papers (2022-06-11T20:10:35Z) - Low-Rank+Sparse Tensor Compression for Neural Networks [11.632913694957868]
We propose to combine low-rank tensor decomposition with sparse pruning in order to take advantage of both coarse and fine structure for compression.
We compress weights in SOTA architectures (MobileNetv3, EfficientNet, Vision Transformer) and compare this approach to sparse pruning and tensor decomposition alone.
arXiv Detail & Related papers (2021-11-02T15:55:07Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Content-Aware Convolutional Neural Networks [98.97634685964819]
Convolutional Neural Networks (CNNs) have achieved great success due to the powerful feature learning ability of convolution layers.
We propose a Content-aware Convolution (CAC) that automatically detects the smooth windows and applies a 1x1 convolutional kernel to replace the original large kernel.
arXiv Detail & Related papers (2021-06-30T03:54:35Z) - Permute, Quantize, and Fine-tune: Efficient Compression of Neural
Networks [70.0243910593064]
Key to success of vector quantization is deciding which parameter groups should be compressed together.
In this paper we make the observation that the weights of two adjacent layers can be permuted while expressing the same function.
We then establish a connection to rate-distortion theory and search for permutations that result in networks that are easier to compress.
arXiv Detail & Related papers (2020-10-29T15:47:26Z) - Compression strategies and space-conscious representations for deep
neural networks [0.3670422696827526]
Recent advances in deep learning have made available powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications.
CNNs have millions of parameters, thus they are not deployable on resource-limited platforms.
In this paper, we investigate the impact of lossy compression of CNNs by weight pruning and quantization.
arXiv Detail & Related papers (2020-07-15T19:41:19Z) - Neural Sparse Representation for Image Restoration [116.72107034624344]
Inspired by the robustness and efficiency of sparse coding based image restoration models, we investigate the sparsity of neurons in deep networks.
Our method structurally enforces sparsity constraints upon hidden neurons.
Experiments show that sparse representation is crucial in deep neural networks for multiple image restoration tasks.
arXiv Detail & Related papers (2020-06-08T05:15:17Z) - Binarizing MobileNet via Evolution-based Searching [66.94247681870125]
We propose a use of evolutionary search to facilitate the construction and training scheme when binarizing MobileNet.
Inspired by one-shot architecture search frameworks, we manipulate the idea of group convolution to design efficient 1-Bit Convolutional Neural Networks (CNNs)
Our objective is to come up with a tiny yet efficient binary neural architecture by exploring the best candidates of the group convolution.
arXiv Detail & Related papers (2020-05-13T13:25:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.