Learning Accurate Performance Predictors for Ultrafast Automated Model
Compression
- URL: http://arxiv.org/abs/2304.06393v1
- Date: Thu, 13 Apr 2023 10:52:49 GMT
- Title: Learning Accurate Performance Predictors for Ultrafast Automated Model
Compression
- Authors: Ziwei Wang, Jiwen Lu, Han Xiao, Shengyu Liu, Jie Zhou
- Abstract summary: We propose an ultrafast automated model compression framework called SeerNet for flexible network deployment.
Our method achieves competitive accuracy-complexity trade-offs with significant reduction of the search cost.
- Score: 86.22294249097203
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we propose an ultrafast automated model compression framework
called SeerNet for flexible network deployment. Conventional
non-differen-tiable methods discretely search the desirable compression policy
based on the accuracy from exhaustively trained lightweight models, and
existing differentiable methods optimize an extremely large supernet to obtain
the required compressed model for deployment. They both cause heavy
computational cost due to the complex compression policy search and evaluation
process. On the contrary, we obtain the optimal efficient networks by directly
optimizing the compression policy with an accurate performance predictor, where
the ultrafast automated model compression for various computational cost
constraint is achieved without complex compression policy search and
evaluation. Specifically, we first train the performance predictor based on the
accuracy from uncertain compression policies actively selected by efficient
evolutionary search, so that informative supervision is provided to learn the
accurate performance predictor with acceptable cost. Then we leverage the
gradient that maximizes the predicted performance under the barrier complexity
constraint for ultrafast acquisition of the desirable compression policy, where
adaptive update stepsizes with momentum are employed to enhance optimality of
the acquired pruning and quantization strategy. Compared with the
state-of-the-art automated model compression methods, experimental results on
image classification and object detection show that our method achieves
competitive accuracy-complexity trade-offs with significant reduction of the
search cost.
Related papers
- Unified Framework for Neural Network Compression via Decomposition and Optimal Rank Selection [3.3454373538792552]
We present a unified framework that applies decomposition and optimal rank selection, employing a composite compression loss within defined rank constraints.
Our approach includes an automatic rank search in a continuous space, efficiently identifying optimal rank configurations without the use of training data.
Using various benchmark datasets, we demonstrate the efficacy of our method through a comprehensive analysis.
arXiv Detail & Related papers (2024-09-05T14:15:54Z) - Towards Optimal Compression: Joint Pruning and Quantization [1.191194620421783]
This paper introduces FITCompress, a novel method integrating layer-wise mixed-precision quantization and unstructured pruning.
Experiments on computer vision and natural language processing benchmarks demonstrate that our proposed approach achieves a superior compression-performance trade-off.
arXiv Detail & Related papers (2023-02-15T12:02:30Z) - L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and
Accurate Deep Learning [24.712888488317816]
We provide a framework for adapting the degree of compression across the model's layers dynamically during training.
Our framework, called L-GreCo, is based on an adaptive algorithm, which automatically picks the optimal compression parameters for model layers.
arXiv Detail & Related papers (2022-10-31T14:37:41Z) - Optimal Rate Adaption in Federated Learning with Compressed
Communications [28.16239232265479]
Federated Learning incurs high communication overhead, which can be greatly alleviated by compression for model updates.
tradeoff between compression and model accuracy in the networked environment remains unclear.
We present a framework to maximize the final model accuracy by strategically adjusting the compression each iteration.
arXiv Detail & Related papers (2021-12-13T14:26:15Z) - Generalizable Mixed-Precision Quantization via Attribution Rank
Preservation [90.26603048354575]
We propose a generalizable mixed-precision quantization (GMPQ) method for efficient inference.
Our method obtains competitive accuracy-complexity trade-off compared with the state-of-the-art mixed-precision networks.
arXiv Detail & Related papers (2021-08-05T16:41:57Z) - You Only Compress Once: Towards Effective and Elastic BERT Compression
via Exploit-Explore Stochastic Nature Gradient [88.58536093633167]
Existing model compression approaches require re-compression or fine-tuning across diverse constraints to accommodate various hardware deployments.
We propose a novel approach, YOCO-BERT, to achieve compress once and deploy everywhere.
Compared with state-of-the-art algorithms, YOCO-BERT provides more compact models, yet achieving 2.1%-4.5% average accuracy improvement on the GLUE benchmark.
arXiv Detail & Related papers (2021-06-04T12:17:44Z) - An Efficient Statistical-based Gradient Compression Technique for
Distributed Training Systems [77.88178159830905]
Sparsity-Inducing Distribution-based Compression (SIDCo) is a threshold-based sparsification scheme that enjoys similar threshold estimation quality to deep gradient compression (DGC)
Our evaluation shows SIDCo speeds up training by up to 41:7%, 7:6%, and 1:9% compared to the no-compression baseline, Topk, and DGC compressors, respectively.
arXiv Detail & Related papers (2021-01-26T13:06:00Z) - Neural Network Compression Via Sparse Optimization [23.184290795230897]
We propose a model compression framework based on the recent progress on sparse optimization.
We achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation of accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet.
arXiv Detail & Related papers (2020-11-10T03:03:55Z) - Structured Sparsification with Joint Optimization of Group Convolution
and Channel Shuffle [117.95823660228537]
We propose a novel structured sparsification method for efficient network compression.
The proposed method automatically induces structured sparsity on the convolutional weights.
We also address the problem of inter-group communication with a learnable channel shuffle mechanism.
arXiv Detail & Related papers (2020-02-19T12:03:10Z) - End-to-End Facial Deep Learning Feature Compression with Teacher-Student
Enhancement [57.18801093608717]
We propose a novel end-to-end feature compression scheme by leveraging the representation and learning capability of deep neural networks.
In particular, the extracted features are compactly coded in an end-to-end manner by optimizing the rate-distortion cost.
We verify the effectiveness of the proposed model with the facial feature, and experimental results reveal better compression performance in terms of rate-accuracy.
arXiv Detail & Related papers (2020-02-10T10:08:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.