Structural Pruning via Latency-Saliency Knapsack
- URL: http://arxiv.org/abs/2210.06659v1
- Date: Thu, 13 Oct 2022 01:41:59 GMT
- Title: Structural Pruning via Latency-Saliency Knapsack
- Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jianna Liu, Jose M.
Alvarez
- Abstract summary: Hardware-Aware Structural Pruning (HALP)
HALP formulates structural pruning as a global resource allocation optimization problem.
Uses latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop.
- Score: 40.562285600570924
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structural pruning can simplify network architecture and improve inference
speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates
structural pruning as a global resource allocation optimization problem, aiming
at maximizing the accuracy while constraining latency under a predefined budget
on targeting device. For filter importance ranking, HALP leverages latency
lookup table to track latency reduction potential and global saliency score to
gauge accuracy drop. Both metrics can be evaluated very efficiently during
pruning, allowing us to reformulate global structural pruning under a reward
maximization problem given target constraint. This makes the problem solvable
via our augmented knapsack solver, enabling HALP to surpass prior work in
pruning efficacy and accuracy-efficiency trade-off. We examine HALP on both
classification and detection tasks, over varying networks, on ImageNet and VOC
datasets, on different platforms. In particular, for ResNet-50/-101 pruning on
ImageNet, HALP improves network throughput by $1.60\times$/$1.90\times$ with
$+0.3\%$/$-0.2\%$ top-1 accuracy changes, respectively. For SSD pruning on VOC,
HALP improves throughput by $1.94\times$ with only a $0.56$ mAP drop. HALP
consistently outperforms prior art, sometimes by large margins. Project page at
https://halp-neurips.github.io/.
Related papers
- Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - Accelerating Deep Neural Networks via Semi-Structured Activation
Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency.
We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications.
Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z) - Layer-adaptive Structured Pruning Guided by Latency [7.193554978191659]
Structured pruning can simplify network architecture and improve inference speed.
We propose a global importance score SP-LAMP by deriving a global importance score LAMP from unstructured pruning to structured pruning.
Experimental results in ResNet56 on CIFAR10 demonstrate that our algorithm achieves lower latency compared to alternative approaches.
arXiv Detail & Related papers (2023-05-23T11:18:37Z) - DeepReShape: Redesigning Neural Networks for Efficient Private Inference [3.7802450241986945]
Recent work has shown that FLOPs for PI can no longer be ignored and incur high latency penalties.
We develop DeepReShape, a technique that optimize neural network architectures under PI's constraints.
arXiv Detail & Related papers (2023-04-20T18:27:02Z) - Neural Network Pruning by Cooperative Coevolution [16.0753044050118]
We propose a new filter pruning algorithm CCEP by cooperative coevolution.
CCEP reduces the pruning space by a divide-and-conquer strategy.
Experiments show that CCEP can achieve a competitive performance with the state-of-the-art pruning methods.
arXiv Detail & Related papers (2022-04-12T09:06:38Z) - Interspace Pruning: Using Adaptive Filter Representations to Improve
Training of Sparse CNNs [69.3939291118954]
Unstructured pruning is well suited to reduce the memory footprint of convolutional neural networks (CNNs)
Standard unstructured pruning (SP) reduces the memory footprint of CNNs by setting filter elements to zero.
We introduce interspace pruning (IP), a general tool to improve existing pruning methods.
arXiv Detail & Related papers (2022-03-15T11:50:45Z) - Selective Network Linearization for Efficient Private Inference [49.937470642033155]
We propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy.
The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2times$ less latency (iso-accuracy at 70%) than the current state of the art.
arXiv Detail & Related papers (2022-02-04T19:00:24Z) - HALP: Hardware-Aware Latency Pruning [25.071902504529465]
Hardware-Aware Structural Pruning (HALP)
HALP formulates structural pruning as a global resource allocation optimization problem.
We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets.
arXiv Detail & Related papers (2021-10-20T22:34:51Z) - HANT: Hardware-Aware Network Transformation [82.54824188745887]
We propose hardware-aware network transformation (HANT)
HANT replaces inefficient operations with more efficient alternatives using a neural architecture search like approach.
Our results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with 0.4% drop in the top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-07-12T18:46:34Z) - EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning [82.54669314604097]
EagleEye is a simple yet efficient evaluation component based on adaptive batch normalization.
It unveils a strong correlation between different pruned structures and their final settled accuracy.
This module is also general to plug-in and improve some existing pruning algorithms.
arXiv Detail & Related papers (2020-07-06T01:32:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.