Related papers: HALP: Hardware-Aware Latency Pruning

HALP: Hardware-Aware Latency Pruning

URL: http://arxiv.org/abs/2110.10811v1
Date: Wed, 20 Oct 2021 22:34:51 GMT
Title: HALP: Hardware-Aware Latency Pruning
Authors: Maying Shen, Hongxu Yin, Pavlo Molchanov, Lei Mao, Jianna Liu, Jose M. Alvarez
Abstract summary: Hardware-Aware Structural Pruning (HALP) HALP formulates structural pruning as a global resource allocation optimization problem. We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets.
Score: 25.071902504529465
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Structural pruning can simplify network architecture and improve inference speed. We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget. For filter importance ranking, HALP leverages latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop. Both metrics can be evaluated very efficiently during pruning, allowing us to reformulate global structural pruning under a reward maximization problem given target constraint. This makes the problem solvable via our augmented knapsack solver, enabling HALP to surpass prior work in pruning efficacy and accuracy-efficiency trade-off. We examine HALP on both classification and detection tasks, over varying networks, on ImageNet and VOC datasets. In particular, for ResNet-50/-101 pruning on ImageNet, HALP improves network throughput by $1.60\times$/$1.90\times$ with $+0.3\%$/$-0.2\%$ top-1 accuracy changes, respectively. For SSD pruning on VOC, HALP improves throughput by $1.94\times$ with only a $0.56$ mAP drop. HALP consistently outperforms prior art, sometimes by large margins.

Related papers

MDP: Multidimensional Vision Model Pruning with Latency Constraint [17.256693658926405]
We introduce Multi-Dimensional Pruning (MDP), a novel paradigm that jointly optimize across a variety of pruning granularities. Extensive experiments demonstrate that MDP significantly outperforms previous methods, especially at high pruning ratios.
arXiv Detail & Related papers (2025-04-02T23:00:10Z)
Accelerating Deep Neural Networks via Semi-Structured Activation Sparsity [0.0]
Exploiting sparsity in the network's feature maps is one of the ways to reduce its inference latency. We propose a solution to induce semi-structured activation sparsity exploitable through minor runtime modifications. Our approach yields a speed improvement of $1.25 times$ with a minimal accuracy drop of $1.1%$ for the ResNet18 model on the ImageNet dataset.
arXiv Detail & Related papers (2023-09-12T22:28:53Z)
Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks. It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping. It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z)
Layer-adaptive Structured Pruning Guided by Latency [7.193554978191659]
Structured pruning can simplify network architecture and improve inference speed. We propose a global importance score SP-LAMP by deriving a global importance score LAMP from unstructured pruning to structured pruning. Experimental results in ResNet56 on CIFAR10 demonstrate that our algorithm achieves lower latency compared to alternative approaches.
arXiv Detail & Related papers (2023-05-23T11:18:37Z)
DeepReShape: Redesigning Neural Networks for Efficient Private Inference [3.7802450241986945]
Recent work has shown that FLOPs for PI can no longer be ignored and incur high latency penalties. We develop DeepReShape, a technique that optimize neural network architectures under PI's constraints.
arXiv Detail & Related papers (2023-04-20T18:27:02Z)
Structural Pruning via Latency-Saliency Knapsack [40.562285600570924]
Hardware-Aware Structural Pruning (HALP) HALP formulates structural pruning as a global resource allocation optimization problem. Uses latency lookup table to track latency reduction potential and global saliency score to gauge accuracy drop.
arXiv Detail & Related papers (2022-10-13T01:41:59Z)
Neural Network Pruning by Cooperative Coevolution [16.0753044050118]
We propose a new filter pruning algorithm CCEP by cooperative coevolution. CCEP reduces the pruning space by a divide-and-conquer strategy. Experiments show that CCEP can achieve a competitive performance with the state-of-the-art pruning methods.
arXiv Detail & Related papers (2022-04-12T09:06:38Z)
Selective Network Linearization for Efficient Private Inference [49.937470642033155]
We propose a gradient-based algorithm that selectively linearizes ReLUs while maintaining prediction accuracy. The results demonstrate up to $4.25%$ more accuracy (iso-ReLU count at 50K) or $2.2times$ less latency (iso-accuracy at 70%) than the current state of the art.
arXiv Detail & Related papers (2022-02-04T19:00:24Z)
Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration [71.80326738527734]
We propose a general, fine-grained structured pruning scheme and corresponding compiler optimizations. We show that our pruning scheme mapping methods, together with the general fine-grained structured pruning scheme, outperform the state-of-the-art DNN optimization framework.
arXiv Detail & Related papers (2021-11-22T23:53:14Z)
HANT: Hardware-Aware Network Transformation [82.54824188745887]
We propose hardware-aware network transformation (HANT) HANT replaces inefficient operations with more efficient alternatives using a neural architecture search like approach. Our results on accelerating the EfficientNet family show that HANT can accelerate them by up to 3.6x with 0.4% drop in the top-1 accuracy on the ImageNet dataset.
arXiv Detail & Related papers (2021-07-12T18:46:34Z)
EagleEye: Fast Sub-net Evaluation for Efficient Neural Network Pruning [82.54669314604097]
EagleEye is a simple yet efficient evaluation component based on adaptive batch normalization. It unveils a strong correlation between different pruned structures and their final settled accuracy. This module is also general to plug-in and improve some existing pruning algorithms.
arXiv Detail & Related papers (2020-07-06T01:32:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.