Compression strategies and space-conscious representations for deep
neural networks
- URL: http://arxiv.org/abs/2007.07967v1
- Date: Wed, 15 Jul 2020 19:41:19 GMT
- Title: Compression strategies and space-conscious representations for deep
neural networks
- Authors: Giosu\`e Cataldo Marin\`o, Gregorio Ghidoli, Marco Frasca and Dario
Malchiodi
- Abstract summary: Recent advances in deep learning have made available powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications.
CNNs have millions of parameters, thus they are not deployable on resource-limited platforms.
In this paper, we investigate the impact of lossy compression of CNNs by weight pruning and quantization.
- Score: 0.3670422696827526
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in deep learning have made available large, powerful
convolutional neural networks (CNN) with state-of-the-art performance in
several real-world applications. Unfortunately, these large-sized models have
millions of parameters, thus they are not deployable on resource-limited
platforms (e.g. where RAM is limited). Compression of CNNs thereby becomes a
critical problem to achieve memory-efficient and possibly computationally
faster model representations. In this paper, we investigate the impact of lossy
compression of CNNs by weight pruning and quantization, and lossless weight
matrix representations based on source coding. We tested several combinations
of these techniques on four benchmark datasets for classification and
regression problems, achieving compression rates up to $165$ times, while
preserving or improving the model performance.
Related papers
- "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z) - Dynamic Semantic Compression for CNN Inference in Multi-access Edge
Computing: A Graph Reinforcement Learning-based Autoencoder [82.8833476520429]
We propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN) for effective semantic extraction and compression in partial offloading.
In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features.
In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy.
arXiv Detail & Related papers (2024-01-19T15:19:47Z) - Convolutional Neural Network Compression via Dynamic Parameter Rank
Pruning [4.7027290803102675]
We propose an efficient training method for CNN compression via dynamic parameter rank pruning.
Our experiments show that the proposed method can yield substantial storage savings while maintaining or even enhancing classification performance.
arXiv Detail & Related papers (2024-01-15T23:52:35Z) - Resource Constrained Model Compression via Minimax Optimization for
Spiking Neural Networks [11.19282454437627]
Spiking Neural Networks (SNNs) have the characteristics of event-driven and high energy-efficient networks.
It is difficult to deploy these networks on resource-limited edge devices directly.
We propose an improved end-to-end Minimax optimization method for this sparse learning problem.
arXiv Detail & Related papers (2023-08-09T02:50:15Z) - COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities.
We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Joint Matrix Decomposition for Deep Convolutional Neural Networks
Compression [5.083621265568845]
Deep convolutional neural networks (CNNs) with a large number of parameters requires huge computational resources.
Decomposition-based methods, therefore, have been utilized to compress CNNs in recent years.
We propose to compress CNNs and alleviate performance degradation via joint matrix decomposition.
arXiv Detail & Related papers (2021-07-09T12:32:10Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - Convolutional neural networks compression with low rank and sparse
tensor decompositions [0.0]
Convolutional neural networks show outstanding results in a variety of computer vision tasks.
For some real-world applications, it is crucial to develop models, which can be fast and light enough to run on edge systems and mobile devices.
In this work, we consider a neural network compression method based on tensor decompositions.
arXiv Detail & Related papers (2020-06-11T13:53:18Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.