"Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach
- URL: http://arxiv.org/abs/2403.00258v1
- Date: Fri, 1 Mar 2024 03:46:28 GMT
- Title: "Lossless" Compression of Deep Neural Networks: A High-dimensional
Neural Tangent Kernel Approach
- Authors: Lingyu Gu, Yongqi Du, Yuan Zhang, Di Xie, Shiliang Pu, Robert C. Qiu,
Zhenyu Liao
- Abstract summary: We provide a novel compression approach to wide and fully-connected emphdeep neural nets.
Experiments on both synthetic and real-world data are conducted to support the advantages of the proposed compression scheme.
- Score: 49.744093838327615
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Modern deep neural networks (DNNs) are extremely powerful; however, this
comes at the price of increased depth and having more parameters per layer,
making their training and inference more computationally challenging. In an
attempt to address this key limitation, efforts have been devoted to the
compression (e.g., sparsification and/or quantization) of these large-scale
machine learning models, so that they can be deployed on low-power IoT devices.
In this paper, building upon recent advances in neural tangent kernel (NTK) and
random matrix theory (RMT), we provide a novel compression approach to wide and
fully-connected \emph{deep} neural nets. Specifically, we demonstrate that in
the high-dimensional regime where the number of data points $n$ and their
dimension $p$ are both large, and under a Gaussian mixture model for the data,
there exists \emph{asymptotic spectral equivalence} between the NTK matrices
for a large family of DNN models. This theoretical result enables "lossless"
compression of a given DNN to be performed, in the sense that the compressed
network yields asymptotically the same NTK as the original (dense and
unquantized) network, with its weights and activations taking values
\emph{only} in $\{ 0, \pm 1 \}$ up to a scaling. Experiments on both synthetic
and real-world data are conducted to support the advantages of the proposed
compression scheme, with code available at
\url{https://github.com/Model-Compression/Lossless_Compression}.
Related papers
- Pruning Neural Networks via Coresets and Convex Geometry: Towards No
Assumptions [10.635248457021499]
Pruning is one of the predominant approaches for compressing deep neural networks (DNNs)
We propose a novel and robust framework for computing such coresets under mild assumptions on the model's weights and inputs.
Our method outperforms existing coreset based neural pruning approaches across a wide range of networks and datasets.
arXiv Detail & Related papers (2022-09-18T12:45:26Z) - COIN++: Data Agnostic Neural Compression [55.27113889737545]
COIN++ is a neural compression framework that seamlessly handles a wide range of data modalities.
We demonstrate the effectiveness of our method by compressing various data modalities.
arXiv Detail & Related papers (2022-01-30T20:12:04Z) - Nonlinear Tensor Ring Network [39.89070144585793]
State-of-the-art deep neural networks (DNNs) have been widely applied for various real-world applications, and achieved significant performance for cognitive problems.
By converting redundant models into compact ones, compression technique appears to be a practical solution to reducing the storage and memory consumption.
In this paper, we develop a nonlinear tensor ring network (NTRN) in which both fullyconnected and convolutional layers are compressed.
arXiv Detail & Related papers (2021-11-12T02:02:55Z) - Low-Rank+Sparse Tensor Compression for Neural Networks [11.632913694957868]
We propose to combine low-rank tensor decomposition with sparse pruning in order to take advantage of both coarse and fine structure for compression.
We compress weights in SOTA architectures (MobileNetv3, EfficientNet, Vision Transformer) and compare this approach to sparse pruning and tensor decomposition alone.
arXiv Detail & Related papers (2021-11-02T15:55:07Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - Quantized Neural Networks via {-1, +1} Encoding Decomposition and
Acceleration [83.84684675841167]
We propose a novel encoding scheme using -1, +1 to decompose quantized neural networks (QNNs) into multi-branch binary networks.
We validate the effectiveness of our method on large-scale image classification, object detection, and semantic segmentation tasks.
arXiv Detail & Related papers (2021-06-18T03:11:15Z) - Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch [75.69506249886622]
Sparsity in Deep Neural Networks (DNNs) has been widely studied to compress and accelerate the models on resource-constrained environments.
In this paper, we are the first to study training from scratch an N:M fine-grained structured sparse network.
arXiv Detail & Related papers (2021-02-08T05:55:47Z) - DeepReduce: A Sparse-tensor Communication Framework for Distributed Deep
Learning [79.89085533866071]
This paper introduces DeepReduce, a versatile framework for the compressed communication of sparse tensors.
DeepReduce decomposes tensors in two sets, values and indices, and allows both independent and combined compression of these sets.
Our experiments with large real models demonstrate that DeepReduce transmits fewer data and imposes lower computational overhead than existing methods.
arXiv Detail & Related papers (2021-02-05T11:31:24Z) - Compression strategies and space-conscious representations for deep
neural networks [0.3670422696827526]
Recent advances in deep learning have made available powerful convolutional neural networks (CNN) with state-of-the-art performance in several real-world applications.
CNNs have millions of parameters, thus they are not deployable on resource-limited platforms.
In this paper, we investigate the impact of lossy compression of CNNs by weight pruning and quantization.
arXiv Detail & Related papers (2020-07-15T19:41:19Z) - Hybrid Tensor Decomposition in Neural Network Compression [13.146051056642904]
We introduce the hierarchical Tucker (HT) decomposition method to investigate its capability in neural network compression.
We experimentally discover that the HT format has better performance on compressing weight matrices, while the TT format is more suited for compressing convolutional kernels.
arXiv Detail & Related papers (2020-06-29T11:16:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.