Related papers: Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint

Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint

URL: http://arxiv.org/abs/2406.12079v1
Date: Mon, 17 Jun 2024 20:40:09 GMT
Title: Multi-Dimensional Pruning: Joint Channel, Layer and Block Pruning with Latency Constraint
Authors: Xinglong Sun, Barath Lakshmanan, Maying Shen, Shiyi Lan, Jingde Chen, Jose Alvarez,
Abstract summary: Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. We propose a novel multi-dimensional pruning framework that jointly optimize pruning across channels, layers, and blocks. In 3D object detection, we establish a new state-of-the-art by pruning StreamPETR at a 45% pruning ratio, achieving higher FPS (37.3 vs. 31.7) and mAP (0.451 vs. 0.449) than the dense baseline.
Score: 7.757464614718271
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: As we push the boundaries of performance in various vision tasks, the models grow in size correspondingly. To keep up with this growth, we need very aggressive pruning techniques for efficient inference and deployment on edge devices. Existing pruning approaches are limited to channel pruning and struggle with aggressive parameter reductions. In this paper, we propose a novel multi-dimensional pruning framework that jointly optimizes pruning across channels, layers, and blocks while adhering to latency constraints. We develop a latency modeling technique that accurately captures model-wide latency variations during pruning, which is crucial for achieving an optimal latency-accuracy trade-offs at high pruning ratio. We reformulate pruning as a Mixed-Integer Nonlinear Program (MINLP) to efficiently determine the optimal pruned structure with only a single pass. Our extensive results demonstrate substantial improvements over previous methods, particularly at large pruning ratios. In classification, our method significantly outperforms prior art HALP with a Top-1 accuracy of 70.0(v.s. 68.6) and an FPS of 5262 im/s(v.s. 4101 im/s). In 3D object detection, we establish a new state-of-the-art by pruning StreamPETR at a 45% pruning ratio, achieving higher FPS (37.3 vs. 31.7) and mAP (0.451 vs. 0.449) than the dense baseline.

Related papers

MDP: Multidimensional Vision Model Pruning with Latency Constraint [17.256693658926405]
We introduce Multi-Dimensional Pruning (MDP), a novel paradigm that jointly optimize across a variety of pruning granularities. Extensive experiments demonstrate that MDP significantly outperforms previous methods, especially at high pruning ratios.
arXiv Detail & Related papers (2025-04-02T23:00:10Z)
DRIVE: Dual Gradient-Based Rapid Iterative Pruning [2.209921757303168]
Modern deep neural networks (DNNs) consist of millions of parameters, necessitating high-performance computing during training and inference. Traditional pruning methods that are applied post-training focus on streamlining inference, but there are recent efforts to leverage sparsity early on by pruning before training. We present Dual Gradient-Based Rapid Iterative Pruning (DRIVE), which leverages dense training for initial epochs to counteract the randomness inherent at the inception.
arXiv Detail & Related papers (2024-04-01T20:44:28Z)
FALCON: FLOP-Aware Combinatorial Optimization for Neural Network Pruning [17.60353530072587]
Network pruning offers a solution to reduce model size and computational cost while maintaining performance. Most current pruning methods focus primarily on improving sparsity by reducing the number of nonzero parameters. We propose FALCON, a novel-optimization-based framework for network pruning that jointly takes into account model accuracy (fidelity), FLOPs, and sparsity constraints.
arXiv Detail & Related papers (2024-03-11T18:40:47Z)
Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes [68.86687117368247]
We introduce Bonsai, a gradient-free structured pruning method that eliminates the need for backpropagation. Bonsai achieves better compression with fewer resources, but also produces models that are twice as fast as those generated by semi-structured pruning. Our results show that removing backprop as a requirement can also lead to state-of-the-art efficiency and performance.
arXiv Detail & Related papers (2024-02-08T04:48:26Z)
Dynamic Structure Pruning for Compressing CNNs [13.73717878732162]
We introduce a novel structure pruning method, termed as dynamic structure pruning, to identify optimal pruning granularities for intra-channel pruning. The experimental results show that dynamic structure pruning achieves state-of-the-art pruning performance and better realistic acceleration on a GPU compared with channel pruning.
arXiv Detail & Related papers (2023-03-17T02:38:53Z)
Advancing Model Pruning via Bi-level Optimization [89.88761425199598]
iterative magnitude pruning (IMP) is the predominant pruning method to successfully find 'winning tickets' One-shot pruning methods have been developed, but these schemes are usually unable to find winning tickets as good as IMP. We show that the proposed bi-level optimization-oriented pruning method (termed BiP) is a special class of BLO problems with a bi-linear problem structure.
arXiv Detail & Related papers (2022-10-08T19:19:29Z)
Attentive Fine-Grained Structured Sparsity for Image Restoration [63.35887911506264]
N:M structured pruning has appeared as one of the effective and practical pruning approaches for making the model efficient with the accuracy constraint. We propose a novel pruning method that determines the pruning ratio for N:M structured sparsity at each layer.
arXiv Detail & Related papers (2022-04-26T12:44:55Z)
MLPruning: A Multilevel Structured Pruning Framework for Transformer-based Models [78.45898846056303]
Pruning is an effective method to reduce the memory footprint and computational cost associated with large natural language processing models. We develop a novel MultiLevel structured Pruning framework, which uses three different levels of structured pruning: head pruning, row pruning, and block-wise sparse pruning.
arXiv Detail & Related papers (2021-05-30T22:00:44Z)
Hessian-Aware Pruning and Optimal Neural Implant [74.3282611517773]
Pruning is an effective method to reduce the memory footprint and FLOPs associated with neural network models. We introduce a new Hessian Aware Pruning method coupled with a Neural Implant approach that uses second-order sensitivity as a metric for structured pruning.
arXiv Detail & Related papers (2021-01-22T04:08:03Z)
Joint Multi-Dimension Pruning via Numerical Gradient Update [120.59697866489668]
We present joint multi-dimension pruning (abbreviated as JointPruning), an effective method of pruning a network on three crucial aspects: spatial, depth and channel simultaneously. We show that our method is optimized collaboratively across the three dimensions in a single end-to-end training and it is more efficient than the previous exhaustive methods.
arXiv Detail & Related papers (2020-05-18T17:57:09Z)
Lookahead: A Far-Sighted Alternative of Magnitude-based Pruning [83.99191569112682]
Magnitude-based pruning is one of the simplest methods for pruning neural networks. We develop a simple pruning method, coined lookahead pruning, by extending the single layer optimization to a multi-layer optimization. Our experimental results demonstrate that the proposed method consistently outperforms magnitude-based pruning on various networks.
arXiv Detail & Related papers (2020-02-12T05:38:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.