Resource Efficient Neural Networks Using Hessian Based Pruning
- URL: http://arxiv.org/abs/2306.07030v1
- Date: Mon, 12 Jun 2023 11:09:16 GMT
- Title: Resource Efficient Neural Networks Using Hessian Based Pruning
- Authors: Jack Chong, Manas Gupta, Lihui Chen
- Abstract summary: We modify the existing approach by estimating the Hessian trace using FP16 precision instead of FP32.
Our modified approach can achieve speed ups ranging from 17% to as much as 44% during our experiments on different combinations of model architectures and GPU devices.
We also present the results of pruning using both FP16 and FP32 Hessian trace calculation and show that there are no noticeable accuracy differences between the two.
- Score: 7.042897867094235
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Neural network pruning is a practical way for reducing the size of trained
models and the number of floating-point operations. One way of pruning is to
use the relative Hessian trace to calculate sensitivity of each channel, as
compared to the more common magnitude pruning approach. However, the stochastic
approach used to estimate the Hessian trace needs to iterate over many times
before it can converge. This can be time-consuming when used for larger models
with many millions of parameters. To address this problem, we modify the
existing approach by estimating the Hessian trace using FP16 precision instead
of FP32. We test the modified approach (EHAP) on
ResNet-32/ResNet-56/WideResNet-28-8 trained on CIFAR10/CIFAR100 image
classification tasks and achieve faster computation of the Hessian trace.
Specifically, our modified approach can achieve speed ups ranging from 17% to
as much as 44% during our experiments on different combinations of model
architectures and GPU devices. Our modified approach also takes up around 40%
less GPU memory when pruning ResNet-32 and ResNet-56 models, which allows for a
larger Hessian batch size to be used for estimating the Hessian trace.
Meanwhile, we also present the results of pruning using both FP16 and FP32
Hessian trace calculation and show that there are no noticeable accuracy
differences between the two. Overall, it is a simple and effective way to
compute the relative Hessian trace faster without sacrificing on pruned model
performance. We also present a full pipeline using EHAP and quantization aware
training (QAT), using INT8 QAT to compress the network further after pruning.
In particular, we use symmetric quantization for the weights and asymmetric
quantization for the activations.
Related papers
- Joint Pruning and Channel-wise Mixed-Precision Quantization for Efficient Deep Neural Networks [10.229120811024162]
deep neural networks (DNNs) pose significant challenges to their deployment on edge devices.
Common approaches to address this issue are pruning and mixed-precision quantization.
We propose a novel methodology to apply them jointly via a lightweight gradient-based search.
arXiv Detail & Related papers (2024-07-01T08:07:02Z) - Instant Complexity Reduction in CNNs using Locality-Sensitive Hashing [50.79602839359522]
We propose HASTE (Hashing for Tractable Efficiency), a parameter-free and data-free module that acts as a plug-and-play replacement for any regular convolution module.
We are able to drastically compress latent feature maps without sacrificing much accuracy by using locality-sensitive hashing (LSH)
In particular, we are able to instantly drop 46.72% of FLOPs while only losing 1.25% accuracy by just swapping the convolution modules in a ResNet34 on CIFAR-10 for our HASTE module.
arXiv Detail & Related papers (2023-09-29T13:09:40Z) - Deep Multi-Threshold Spiking-UNet for Image Processing [51.88730892920031]
This paper introduces the novel concept of Spiking-UNet for image processing, which combines the power of Spiking Neural Networks (SNNs) with the U-Net architecture.
To achieve an efficient Spiking-UNet, we face two primary challenges: ensuring high-fidelity information propagation through the network via spikes and formulating an effective training strategy.
Experimental results show that, on image segmentation and denoising, our Spiking-UNet achieves comparable performance to its non-spiking counterpart.
arXiv Detail & Related papers (2023-07-20T16:00:19Z) - Quantized Neural Networks for Low-Precision Accumulation with Guaranteed
Overflow Avoidance [68.8204255655161]
We introduce a quantization-aware training algorithm that guarantees avoiding numerical overflow when reducing the precision of accumulators during inference.
We evaluate our algorithm across multiple quantized models that we train for different tasks, showing that our approach can reduce the precision of accumulators while maintaining model accuracy with respect to a floating-point baseline.
arXiv Detail & Related papers (2023-01-31T02:46:57Z) - RedBit: An End-to-End Flexible Framework for Evaluating the Accuracy of
Quantized CNNs [9.807687918954763]
Convolutional Neural Networks (CNNs) have become the standard class of deep neural network for image processing, classification and segmentation tasks.
RedBit is an open-source framework that provides a transparent, easy-to-use interface to evaluate the effectiveness of different algorithms on network accuracy.
arXiv Detail & Related papers (2023-01-15T21:27:35Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - Non-Parametric Adaptive Network Pruning [125.4414216272874]
We introduce non-parametric modeling to simplify the algorithm design.
Inspired by the face recognition community, we use a message passing algorithm to obtain an adaptive number of exemplars.
EPruner breaks the dependency on the training data in determining the "important" filters.
arXiv Detail & Related papers (2021-01-20T06:18:38Z) - Holistic Filter Pruning for Efficient Deep Neural Networks [25.328005340524825]
"Holistic Filter Pruning" (HFP) is a novel approach for common DNN training that is easy to implement and enables to specify accurate pruning rates.
In various experiments, we give insights into the training and achieve state-of-the-art performance on CIFAR-10 and ImageNet.
arXiv Detail & Related papers (2020-09-17T09:23:36Z) - Efficient Integer-Arithmetic-Only Convolutional Neural Networks [87.01739569518513]
We replace conventional ReLU with Bounded ReLU and find that the decline is due to activation quantization.
Our integer networks achieve equivalent performance as the corresponding FPN networks, but have only 1/4 memory cost and run 2x faster on modern GPU.
arXiv Detail & Related papers (2020-06-21T08:23:03Z) - Automatic Pruning for Quantized Neural Networks [35.2752928147013]
We propose an effective pruning strategy for selecting redundant low-precision filters.
We conduct extensive experiments on CIFAR-10 and ImageNet with various architectures and precisions.
For ResNet-18 on ImageNet, we prune 26.12% of the model size with Binarized Neural Network quantization.
arXiv Detail & Related papers (2020-02-03T01:10:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.