Towards Efficient Tensor Decomposition-Based DNN Model Compression with
Optimization Framework
- URL: http://arxiv.org/abs/2107.12422v1
- Date: Mon, 26 Jul 2021 18:31:33 GMT
- Title: Towards Efficient Tensor Decomposition-Based DNN Model Compression with
Optimization Framework
- Authors: Miao Yin, Yang Sui, Siyu Liao and Bo Yuan
- Abstract summary: We propose a systematic framework for tensor decomposition-based model compression using Alternating Direction Method of Multipliers (ADMM)
Our framework is very general, and it works for both CNNs and RNNs.
Experimental results show that our ADMM-based TT-format models demonstrate very high compression performance with high accuracy.
- Score: 14.27609385208807
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Advanced tensor decomposition, such as Tensor train (TT) and Tensor ring
(TR), has been widely studied for deep neural network (DNN) model compression,
especially for recurrent neural networks (RNNs). However, compressing
convolutional neural networks (CNNs) using TT/TR always suffers significant
accuracy loss. In this paper, we propose a systematic framework for tensor
decomposition-based model compression using Alternating Direction Method of
Multipliers (ADMM). By formulating TT decomposition-based model compression to
an optimization problem with constraints on tensor ranks, we leverage ADMM
technique to systemically solve this optimization problem in an iterative way.
During this procedure, the entire DNN model is trained in the original
structure instead of TT format, but gradually enjoys the desired low tensor
rank characteristics. We then decompose this uncompressed model to TT format
and fine-tune it to finally obtain a high-accuracy TT-format DNN model. Our
framework is very general, and it works for both CNNs and RNNs, and can be
easily modified to fit other tensor decomposition approaches. We evaluate our
proposed framework on different DNN models for image classification and video
recognition tasks. Experimental results show that our ADMM-based TT-format
models demonstrate very high compression performance with high accuracy.
Notably, on CIFAR-100, with 2.3X and 2.4X compression ratios, our models have
1.96% and 2.21% higher top-1 accuracy than the original ResNet-20 and
ResNet-32, respectively. For compressing ResNet-18 on ImageNet, our model
achieves 2.47X FLOPs reduction without accuracy loss.
Related papers
- Graph Neural Network for Accurate and Low-complexity SAR ATR [2.9766397696234996]
We propose a graph neural network (GNN) model to achieve accurate and low-latency SAR ATR.
The proposed GNN model has low computation complexity and achieves comparable high accuracy.
Compared with the state-of-the-art CNNs, the proposed GNN model has only 1/3000 computation cost and 1/80 model size.
arXiv Detail & Related papers (2023-05-11T20:17:41Z) - Towards Robust k-Nearest-Neighbor Machine Translation [72.9252395037097]
k-Nearest-Neighbor Machine Translation (kNN-MT) becomes an important research direction of NMT in recent years.
Its main idea is to retrieve useful key-value pairs from an additional datastore to modify translations without updating the NMT model.
The underlying retrieved noisy pairs will dramatically deteriorate the model performance.
We propose a confidence-enhanced kNN-MT model with robust training to alleviate the impact of noise.
arXiv Detail & Related papers (2022-10-17T07:43:39Z) - Exploiting Low-Rank Tensor-Train Deep Neural Networks Based on
Riemannian Gradient Descent With Illustrations of Speech Processing [74.31472195046099]
We exploit a low-rank tensor-train deep neural network (TT-DNN) to build an end-to-end deep learning pipeline, namely LR-TT-DNN.
A hybrid model combining LR-TT-DNN with a convolutional neural network (CNN) is set up to boost the performance.
Our empirical evidence demonstrates that the LR-TT-DNN and CNN+(LR-TT-DNN) models with fewer model parameters can outperform the TT-DNN and CNN+(LR-TT-DNN) counterparts.
arXiv Detail & Related papers (2022-03-11T15:55:34Z) - Effective Model Sparsification by Scheduled Grow-and-Prune Methods [73.03533268740605]
We propose a novel scheduled grow-and-prune (GaP) methodology without pre-training the dense models.
Experiments have shown that such models can match or beat the quality of highly optimized dense models at 80% sparsity on a variety of tasks.
arXiv Detail & Related papers (2021-06-18T01:03:13Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - Towards Extremely Compact RNNs for Video Recognition with Fully
Decomposed Hierarchical Tucker Structure [41.41516453160845]
We propose to develop extremely compact RNN models with fully decomposed hierarchical Tucker (FDHT) structure.
Our experimental results on several popular video recognition datasets show that our proposed fully decomposed hierarchical tucker-based LSTM is extremely compact and highly efficient.
arXiv Detail & Related papers (2021-04-12T18:40:44Z) - A Tunable Robust Pruning Framework Through Dynamic Network Rewiring of
DNNs [8.597091257152567]
We present a dynamic network rewiring (DNR) method to generate pruned deep neural network (DNN) models that are robust against adversarial attacks.
Our experiments show that DNR consistently finds compressed models with better clean and adversarial image classification performance than what is achievable through state-of-the-art alternatives.
arXiv Detail & Related papers (2020-11-03T19:49:00Z) - Tensor Reordering for CNN Compression [7.228285747845778]
We show how parameter redundancy in Convolutional Neural Network (CNN) filters can be effectively reduced by pruning in spectral domain.
Our approach is applied to pretrained CNNs and we show that minor additional fine-tuning allows our method to recover the original model performance.
arXiv Detail & Related papers (2020-10-22T23:45:34Z) - A Fully Tensorized Recurrent Neural Network [48.50376453324581]
We introduce a "fully tensorized" RNN architecture which jointly encodes the separate weight matrices within each recurrent cell.
This approach reduces model size by several orders of magnitude, while still maintaining similar or better performance compared to standard RNNs.
arXiv Detail & Related papers (2020-10-08T18:24:12Z) - Hybrid Tensor Decomposition in Neural Network Compression [13.146051056642904]
We introduce the hierarchical Tucker (HT) decomposition method to investigate its capability in neural network compression.
We experimentally discover that the HT format has better performance on compressing weight matrices, while the TT format is more suited for compressing convolutional kernels.
arXiv Detail & Related papers (2020-06-29T11:16:22Z) - Model Fusion via Optimal Transport [64.13185244219353]
We present a layer-wise model fusion algorithm for neural networks.
We show that this can successfully yield "one-shot" knowledge transfer between neural networks trained on heterogeneous non-i.i.d. data.
arXiv Detail & Related papers (2019-10-12T22:07:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.