Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework
- URL: http://arxiv.org/abs/2010.04879v3
- Date: Tue, 15 Jun 2021 13:02:09 GMT
- Title: Accelerate CNNs from Three Dimensions: A Comprehensive Pruning Framework
- Authors: Wenxiao Wang, Minghao Chen, Shuai Zhao, Long Chen, Jinming Hu, Haifeng
Liu, Deng Cai, Xiaofei He, Wei Liu
- Abstract summary: Most neural network pruning methods prune the network model along one (depth, width, or resolution) solely to meet a computational budget.
We argue that pruning should be conducted along three dimensions comprehensively.
Our proposed algorithm surpasses state-of-the-art pruning algorithms and even neural architecture search-based algorithms.
- Score: 40.635566748735386
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Most neural network pruning methods, such as filter-level and layer-level
prunings, prune the network model along one dimension (depth, width, or
resolution) solely to meet a computational budget. However, such a pruning
policy often leads to excessive reduction of that dimension, thus inducing a
huge accuracy loss. To alleviate this issue, we argue that pruning should be
conducted along three dimensions comprehensively. For this purpose, our pruning
framework formulates pruning as an optimization problem. Specifically, it first
casts the relationships between a certain model's accuracy and
depth/width/resolution into a polynomial regression and then maximizes the
polynomial to acquire the optimal values for the three dimensions. Finally, the
model is pruned along the three optimal dimensions accordingly. In this
framework, since collecting too much data for training the regression is very
time-costly, we propose two approaches to lower the cost: 1) specializing the
polynomial to ensure an accurate regression even with less training data; 2)
employing iterative pruning and fine-tuning to collect the data faster.
Extensive experiments show that our proposed algorithm surpasses
state-of-the-art pruning algorithms and even neural architecture search-based
algorithms.
Related papers
- Polynomial-Time Solutions for ReLU Network Training: A Complexity
Classification via Max-Cut and Zonotopes [70.52097560486683]
We prove that the hardness of approximation of ReLU networks not only mirrors the complexity of the Max-Cut problem but also, in certain special cases, exactly corresponds to it.
In particular, when $epsilonleqsqrt84/83-1approx 0.006$, we show that it is NP-hard to find an approximate global dataset of the ReLU network objective with relative error $epsilon$ with respect to the objective value.
arXiv Detail & Related papers (2023-11-18T04:41:07Z) - Fast Optimization of Weighted Sparse Decision Trees for use in Optimal
Treatment Regimes and Optimal Policy Design [16.512942230284576]
We present three algorithms for efficient sparse weighted decision tree optimization.
The first approach directly optimize the weighted loss function; however, it tends to be computationally inefficient for large datasets.
Second approach, which scales more efficiently, transforms weights to integer values and uses data duplication to transform the weighted decision tree optimization problem into an unweighted (but larger) counterpart.
Third algorithm, which scales to much larger datasets, uses a randomized procedure that samples each data point with a probability proportional to its weight.
arXiv Detail & Related papers (2022-10-13T08:16:03Z) - MLPruning: A Multilevel Structured Pruning Framework for
Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models.
We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z) - A Partial Regularization Method for Network Compression [0.0]
We propose an approach of partial regularization rather than the original form of penalizing all parameters, which is said to be full regularization, to conduct model compression at a higher speed.
Experimental results show that as we expected, the computational complexity is reduced by observing less running time in almost all situations.
Surprisingly, it helps to improve some important metrics such as regression fitting results and classification accuracy in both training and test phases on multiple datasets.
arXiv Detail & Related papers (2020-09-03T00:38:27Z) - Human Body Model Fitting by Learned Gradient Descent [48.79414884222403]
We propose a novel algorithm for the fitting of 3D human shape to images.
We show that this algorithm is fast (avg. 120ms convergence), robust to dataset, and achieves state-of-the-art results on public evaluation datasets.
arXiv Detail & Related papers (2020-08-19T14:26:47Z) - Joint Multi-Dimension Pruning via Numerical Gradient Update [120.59697866489668]
We present joint multi-dimension pruning (abbreviated as JointPruning), an effective method of pruning a network on three crucial aspects: spatial, depth and channel simultaneously.
We show that our method is optimized collaboratively across the three dimensions in a single end-to-end training and it is more efficient than the previous exhaustive methods.
arXiv Detail & Related papers (2020-05-18T17:57:09Z) - Pixel-in-Pixel Net: Towards Efficient Facial Landmark Detection in the
Wild [104.61677518999976]
We propose Pixel-in-Pixel Net (PIPNet) for facial landmark detection.
The proposed model is equipped with a novel detection head based on heatmap regression.
To further improve the cross-domain generalization capability of PIPNet, we propose self-training with curriculum.
arXiv Detail & Related papers (2020-03-08T12:23:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.