Related papers: Combining Relevance and Magnitude for Resource-Aware DNN Pruning

Combining Relevance and Magnitude for Resource-Aware DNN Pruning

URL: http://arxiv.org/abs/2405.13088v1
Date: Tue, 21 May 2024 11:42:15 GMT
Title: Combining Relevance and Magnitude for Resource-Aware DNN Pruning
Authors: Carla Fabiana Chiasserini, Francesco Malandrino, Nuria Molner, Zhiqiang Zhao,
Abstract summary: Pruning neural networks, removing some of their parameters whilst retaining their accuracy, is one of the main ways to reduce the latency of a machine learning pipeline. In this paper, we propose a novel pruning approach, called FlexRel, predicated upon combining training-time and inference-time information. Our performance evaluation shows that FlexRel is able to achieve higher pruning factors, saving over 35% bandwidth for typical accuracy targets.
Score: 16.976723041143956
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Pruning neural networks, i.e., removing some of their parameters whilst retaining their accuracy, is one of the main ways to reduce the latency of a machine learning pipeline, especially in resource- and/or bandwidth-constrained scenarios. In this context, the pruning technique, i.e., how to choose the parameters to remove, is critical to the system performance. In this paper, we propose a novel pruning approach, called FlexRel and predicated upon combining training-time and inference-time information, namely, parameter magnitude and relevance, in order to improve the resulting accuracy whilst saving both computational resources and bandwidth. Our performance evaluation shows that FlexRel is able to achieve higher pruning factors, saving over 35% bandwidth for typical accuracy targets.

Related papers

Flexible Automatic Identification and Removal (FAIR)-Pruner: An Efficient Neural Network Pruning Method [11.575879702610914]
This paper proposes the Flexible Automatic Identification and Removal (FAIR)-Pruner, a novel method for neural network structured pruning.<n>A major advantage of FAIR-Pruner lies in its capacity to automatically determine the layer-wise pruning rates, which yields a more efficient subnetwork structure.<n>With utilization scores and reconstruction errors, users can flexibly obtain pruned models under different pruning ratios.
arXiv Detail & Related papers (2025-08-04T10:59:07Z)
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning [54.99373314906667]
Self-supervised representation learning for point cloud has demonstrated effectiveness in improving pre-trained model performance across diverse tasks. As pre-trained models grow in complexity, fully fine-tuning them for downstream applications demands substantial computational and storage resources. We propose PointLoRA, a simple yet effective method that combines low-rank adaptation (LoRA) with multi-scale token selection to efficiently fine-tune point cloud models.
arXiv Detail & Related papers (2025-04-22T16:41:21Z)
An Efficient Sparse Fine-Tuning with Low Quantization Error via Neural Network Pruning [9.208007322096535]
We develop a new SpFT framework, based on ideas from neural network pruning.<n>We show our method improves SpFT's memory efficiency by 20-50% while matching the accuracy of state-of-the-art methods like LoRA's variants.
arXiv Detail & Related papers (2025-02-17T04:54:42Z)
Learning k-Level Structured Sparse Neural Networks Using Group Envelope Regularization [4.0554893636822]
We introduce a novel approach to deploy large-scale Deep Neural Networks on constrained resources. The method speeds up inference time and aims to reduce memory demand and power consumption.
arXiv Detail & Related papers (2022-12-25T15:40:05Z)
Efficient Graph Neural Network Inference at Large Scale [54.89457550773165]
Graph neural networks (GNNs) have demonstrated excellent performance in a wide range of applications. Existing scalable GNNs leverage linear propagation to preprocess the features and accelerate the training and inference procedure. We propose a novel adaptive propagation order approach that generates the personalized propagation order for each node based on its topological information.
arXiv Detail & Related papers (2022-11-01T14:38:18Z)
Learning to Learn with Generative Models of Neural Network Checkpoints [71.06722933442956]
We construct a dataset of neural network checkpoints and train a generative model on the parameters. We find that our approach successfully generates parameters for a wide range of loss prompts. We apply our method to different neural network architectures and tasks in supervised and reinforcement learning.
arXiv Detail & Related papers (2022-09-26T17:59:58Z)
Lost in Pruning: The Effects of Pruning Neural Networks beyond Test Accuracy [42.15969584135412]
Neural network pruning is a popular technique used to reduce the inference costs of modern networks. We evaluate whether the use of test accuracy alone in the terminating condition is sufficient to ensure that the resulting model performs well. We find that pruned networks effectively approximate the unpruned model, however, the prune ratio at which pruned networks achieve commensurate performance varies significantly across tasks.
arXiv Detail & Related papers (2021-03-04T13:22:16Z)
Recurrent Neural Networks for Stochastic Control Problems with Delay [0.76146285961466]
We propose and systematically study deep neural networks-based algorithms to solve control problems with delay features. Specifically, we employ neural networks for sequence modeling to parameterize the policy and optimize the objective function. The proposed algorithms are tested on three benchmark examples: a linear-quadratic problem, optimal consumption with fixed finite delay, and portfolio optimization with complete memory.
arXiv Detail & Related papers (2021-01-05T07:18:47Z)
Any-Width Networks [43.98007529334065]
We propose an adjustable-width CNN architecture that allows for fine-grained control over speed and accuracy during inference. Our key innovation is the use of lower-triangular weight matrices which explicitly address width-varying batch statistics. We empirically demonstrate that our proposed AWNs compare favorably to existing methods while providing maximally granular control during inference.
arXiv Detail & Related papers (2020-12-06T00:22:01Z)
Adaptive Subcarrier, Parameter, and Power Allocation for Partitioned Edge Learning Over Broadband Channels [69.18343801164741]
partitioned edge learning (PARTEL) implements parameter-server training, a well known distributed learning method, in wireless network. We consider the case of deep neural network (DNN) models which can be trained using PARTEL by introducing some auxiliary variables.
arXiv Detail & Related papers (2020-10-08T15:27:50Z)
Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning. We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset. We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)
Dependency Aware Filter Pruning [74.69495455411987]
Pruning a proportion of unimportant filters is an efficient way to mitigate the inference cost. Previous work prunes filters according to their weight norms or the corresponding batch-norm scaling factors. We propose a novel mechanism to dynamically control the sparsity-inducing regularization so as to achieve the desired sparsity.
arXiv Detail & Related papers (2020-05-06T07:41:22Z)
Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models. This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.