HALP: Hardware-Aware Latency Pruning
- URL: http://arxiv.org/abs/2110.10811v1
- Date: Wed, 20 Oct 2021 22:34:51 GMT
- Title: HALP: Hardware-Aware Latency Pruning
- Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jianna Liu, Jose M.
Alvarez
- Abstract summary: Hardware-Aware Structural Pruning (HALP)
HALP formulates structural pruning as a global resource allocation optimization problem.
We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets.
- Score: 25.071902504529465
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Structural pruning can simplify network architecture and improve inference
speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates
structural pruning as a global resource allocation optimization problem, aiming
at maximizing the accuracy while constraining latency under a predefined
budget. For filter importance ranking, HALP leverages latency lookup table to
track latency reduction potential and global saliency score to gauge accuracy
drop. Both metrics can be evaluated very efficiently during pruning, allowing
us to reformulate global structural pruning under a reward maximization problem
given target constraint. This makes the problem solvable via our augmented
knapsack solver, enabling HALP to surpass prior work in pruning efficacy and
accuracy-efficiency trade-off. We examine HALP on both classification and
detection tasks, over varying networks, on ImageNet and VOC datasets. In
particular, for ResNet-50/-101 pruning on ImageNet, HALP improves network
throughput by $1.60\times$/$1.90\times$ with $+0.3\%$/$-0.2\%$ top-1 accuracy
changes, respectively. For SSD pruning on VOC, HALP improves throughput by
$1.94\times$ with only a $0.56$ mAP drop. HALP consistently outperforms prior
art, sometimes by large margins.
Related papers
- Accelerating Deep Neural Networks via Semi-Structured Activation
Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency.
We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications.
Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z) - Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - Layer-adaptive Structured Pruning Guided by Latency [7.193554978191659]
Structured pruning can simplify network architecture and improve inference speed.
We propose a global importance score SP-LAMP by deriving a global importance score LAMP from unstructured pruning to structured pruning.
Experimental results in ResNet56 on CIFAR10 demonstrate that our algorithm achieves lower latency compared to alternative approaches.
arXiv Detail & Related papers (2023-05-23T11:18:37Z) - DeepReShape: Redesigning Neural Networks for Efficient Private Inference [3.7802450241986945]
Recent work has shown that FLOPs for PI can no longer be ignored and incur high latency penalties.
We develop DeepReShape, a technique that optimize neural network architectures under PI's constraints.
arXiv Detail & Related papers (2023-04-20T18:27:02Z) - Structural Pruning via Latency-Saliency Knapsack [40.562285600570924]
Hardware-Aware Structural Pruning (HALP)
HALP formulates structural pruning as a global resource allocation optimization problem.
Uses latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop.
arXiv Detail & Related papers (2022-10-13T01:41:59Z) - Neural Network Pruning by Cooperative Coevolution [16.0753044050118]
We propose a new filter pruning algorithm CCEP by cooperative coevolution.
CCEP reduces the pruning space by a divide-and-conquer strategy.
Experiments show that CCEP can achieve a competitive performance with the state-of-the-art pruning methods.
arXiv Detail & Related papers (2022-04-12T09:06:38Z) - Selective Network Linearization for Efficient Private Inference [49.937470642033155]
We propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy.
The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2times$ less latency (iso-accuracy at 70%) than the current state of the art.
arXiv Detail & Related papers (2022-02-04T19:00:24Z) - Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time
Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations.
We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z) - HANT: Hardware-Aware Network Transformation [82.54824188745887]
We propose hardware-aware network transformation (HANT)
HANT replaces inefficient operations with more efficient alternatives using a neural architecture search like approach.
Our results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with 0.4% drop in the top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-07-12T18:46:34Z) - EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning [82.54669314604097]
EagleEye is a simple yet efficient evaluation component based on adaptive batch normalization.
It unveils a strong correlation between different pruned structures and their final settled accuracy.
This module is also general to plug-in and improve some existing pruning algorithms.
arXiv Detail & Related papers (2020-07-06T01:32:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.