Related papers: Compression strategies and space-conscious representations for deep neural networks

Compression strategies and space-conscious representations for deep neural networks

URL: http://arxiv.org/abs/2007.07967v1
Date: Wed, 15 Jul 2020 19:41:19 GMT
Title: Compression strategies and space-conscious representations for deep neural networks
Authors: Giosu\`e Cataldo Marin\`o, Gregorio Ghidoli, Marco Frasca and Dario Malchiodi
Abstract summary: Recent advances in deep learning have made available powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications. CNNs have millions of parameters, thus they are not deployable on resource-limited platforms. In this paper, we investigate the impact of lossy compression of CNNs by weight pruning and quantization.
Score: 0.3670422696827526
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in deep learning have made available large, powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications. Unfortunately, these large-sized models have millions of parameters, thus they are not deployable on resource-limited platforms (e.g. where RAM is limited). Compression of CNNs thereby becomes a critical problem to achieve memory-efficient and possibly computationally faster model representations. In this paper, we investigate the impact of lossy compression of CNNs by weight pruning and quantization, and lossless weight matrix representations based on source coding. We tested several combinations of these techniques on four benchmark datasets for classification and regression problems, achieving compression rates up to $165$ times, while preserving or improving the model performance.

Related papers

Reducing Storage of Pretrained Neural Networks by Rate-Constrained Quantization and Entropy Coding [56.066799081747845]
The ever-growing size of neural networks poses serious challenges on resource-constrained devices.<n>We propose a novel post-training compression framework that combines rate-aware quantization with entropy coding.<n>Our method allows for very fast decoding and is compatible with arbitrary quantization grids.
arXiv Detail & Related papers (2025-05-24T15:52:49Z)
"Lossless" Compression of Deep Neural Networks: A High-dimensional Neural Tangent Kernel Approach [49.744093838327615]
We provide a novel compression approach to wide and fully-connected emphdeep neural nets. Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
arXiv Detail & Related papers (2024-03-01T03:46:28Z)
Dynamic Semantic Compression for CNN Inference in Multi-access Edge Computing: A Graph Reinforcement Learning-based Autoencoder [82.8833476520429]
We propose a novel semantic compression method, autoencoder-based CNN architecture (AECNN) for effective semantic extraction and compression in partial offloading. In the semantic encoder, we introduce a feature compression module based on the channel attention mechanism in CNNs, to compress intermediate data by selecting the most informative features. In the semantic decoder, we design a lightweight decoder to reconstruct the intermediate data through learning from the received compressed data to improve accuracy.
arXiv Detail & Related papers (2024-01-19T15:19:47Z)
Convolutional Neural Network Compression via Dynamic Parameter Rank Pruning [4.7027290803102675]
We propose an efficient training method for CNN compression via dynamic parameter rank pruning. Our experiments show that the proposed method can yield substantial storage savings while maintaining or even enhancing classification performance.
arXiv Detail & Related papers (2024-01-15T23:52:35Z)
Resource Constrained Model Compression via Minimax Optimization for Spiking Neural Networks [11.19282454437627]
Spiking Neural Networks (SNNs) have the characteristics of event-driven and high energy-efficient networks. It is difficult to deploy these networks on resource-limited edge devices directly. We propose an improved end-to-end Minimax optimization method for this sparse learning problem.
arXiv Detail & Related papers (2023-08-09T02:50:15Z)
COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities. We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z)
Compact representations of convolutional neural networks via weight pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization. We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z)
Joint Matrix Decomposition for Deep Convolutional Neural Networks Compression [5.083621265568845]
Deep convolutional neural networks (CNNs) with a large number of parameters requires huge computational resources. Decomposition-based methods, therefore, have been utilized to compress CNNs in recent years. We propose to compress CNNs and alleviate performance degradation via joint matrix decomposition.
arXiv Detail & Related papers (2021-07-09T12:32:10Z)
Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models. We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z)
Convolutional neural networks compression with low rank and sparse tensor decompositions [0.0]
Convolutional neural networks show outstanding results in a variety of computer vision tasks. For some real-world applications, it is crucial to develop models, which can be fast and light enough to run on edge systems and mobile devices. In this work, we consider a neural network compression method based on tensor decompositions.
arXiv Detail & Related papers (2020-06-11T13:53:18Z)
Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters. Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques. We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.