Neural Network Compression Via Sparse Optimization
- URL: http://arxiv.org/abs/2011.04868v2
- Date: Wed, 11 Nov 2020 06:03:55 GMT
- Title: Neural Network Compression Via Sparse Optimization
- Authors: Tianyi Chen, Bo Ji, Yixin Shi, Tianyu Ding, Biyi Fang, Sheng Yi, Xiao
Tu
- Abstract summary: We propose a model compression framework based on the recent progress on sparse optimization.
We achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation of accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet.
- Score: 23.184290795230897
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The compression of deep neural networks (DNNs) to reduce inference cost
becomes increasingly important to meet realistic deployment requirements of
various applications. There have been a significant amount of work regarding
network compression, while most of them are heuristic rule-based or typically
not friendly to be incorporated into varying scenarios. On the other hand,
sparse optimization yielding sparse solutions naturally fits the compression
requirement, but due to the limited study of sparse optimization in stochastic
learning, its extension and application onto model compression is rarely well
explored. In this work, we propose a model compression framework based on the
recent progress on sparse stochastic optimization. Compared to existing model
compression techniques, our method is effective and requires fewer extra
engineering efforts to incorporate with varying applications, and has been
numerically demonstrated on benchmark compression tasks. Particularly, we
achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of
evaluation accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet compared to
the baseline heavy models, respectively.
Related papers
- Order of Compression: A Systematic and Optimal Sequence to Combinationally Compress CNN [5.25545980258284]
We propose a systematic and optimal sequence to apply multiple compression techniques in the most effective order.
Our proposed Order of Compression significantly reduces computational costs by up to 859 times on ResNet34, with negligible accuracy loss.
We believe our simple yet effective exploration of the order of compression will shed light on the practice of model compression.
arXiv Detail & Related papers (2024-03-26T07:26:00Z) - Learning Accurate Performance Predictors for Ultrafast Automated Model
Compression [86.22294249097203]
We propose an ultrafast automated model compression framework called SeerNet for flexible network deployment.
Our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
arXiv Detail & Related papers (2023-04-13T10:52:49Z) - Towards Optimal Compression: Joint Pruning and Quantization [1.191194620421783]
This paper introduces FITCompress, a novel method integrating layer-wise mixed-precision quantization and unstructured pruning.
Experiments on computer vision and natural language processing benchmarks demonstrate that our proposed approach achieves a superior compression-performance trade-off.
arXiv Detail & Related papers (2023-02-15T12:02:30Z) - Optimal Rate Adaption in Federated Learning with Compressed
Communications [28.16239232265479]
Federated Learning incurs high communication overhead, which can be greatly alleviated by compression for model updates.
tradeoff between compression and model accuracy in the networked environment remains unclear.
We present a framework to maximize the final model accuracy by strategically adjusting the compression each iteration.
arXiv Detail & Related papers (2021-12-13T14:26:15Z) - You Only Compress Once: Towards Effective and Elastic BERT Compression
via Exploit-Explore Stochastic Nature Gradient [88.58536093633167]
Existing model compression approaches require re-compression or fine-tuning across diverse constraints to accommodate various hardware deployments.
We propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere.
Compared with state-of-the-art algorithms, YOCO-BERT provides more compact models, yet achieving 2.1%-4.5% average accuracy improvement on the GLUE benchmark.
arXiv Detail & Related papers (2021-06-04T12:17:44Z) - Towards Compact CNNs via Collaborative Compression [166.86915086497433]
We propose a Collaborative Compression scheme, which joints channel pruning and tensor decomposition to compress CNN models.
We achieve 52.9% FLOPs reduction by removing 48.4% parameters on ResNet-50 with only a Top-1 accuracy drop of 0.56% on ImageNet 2012.
arXiv Detail & Related papers (2021-05-24T12:07:38Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - ALF: Autoencoder-based Low-rank Filter-sharing for Efficient
Convolutional Neural Networks [63.91384986073851]
We propose the autoencoder-based low-rank filter-sharing technique technique (ALF)
ALF shows a reduction of 70% in network parameters, 61% in operations and 41% in execution time, with minimal loss in accuracy.
arXiv Detail & Related papers (2020-07-27T09:01:22Z) - Structured Sparsification with Joint Optimization of Group Convolution
and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression.
The proposed method automatically induces structured sparsity on the convolutional weights.
We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.