Related papers: DropNet: Reducing Neural Network Complexity via Iterative Pruning

DropNet: Reducing Neural Network Complexity via Iterative Pruning

URL: http://arxiv.org/abs/2207.06646v1
Date: Thu, 14 Jul 2022 03:42:11 GMT
Title: DropNet: Reducing Neural Network Complexity via Iterative Pruning
Authors: John Tan Chong Min, Mehul Motani
Abstract summary: Deep neural networks require a significant amount of computing time and power to train and deploy. We propose DropNet, an iterative pruning method which prunes nodes/filters to reduce network complexity.
Score: 29.519376857728325
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Modern deep neural networks require a significant amount of computing time and power to train and deploy, which limits their usage on edge devices. Inspired by the iterative weight pruning in the Lottery Ticket Hypothesis, we propose DropNet, an iterative pruning method which prunes nodes/filters to reduce network complexity. DropNet iteratively removes nodes/filters with the lowest average post-activation value across all training samples. Empirically, we show that DropNet is robust across diverse scenarios, including MLPs and CNNs using the MNIST, CIFAR-10 and Tiny ImageNet datasets. We show that up to 90% of the nodes/filters can be removed without any significant loss of accuracy. The final pruned network performs well even with reinitialization of the weights and biases. DropNet also has similar accuracy to an oracle which greedily removes nodes/filters one at a time to minimise training loss, highlighting its effectiveness.

Related papers

Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers [83.74380713308605]
We develop a new type of transformation that is fully compatible with a variant of ReLUs -- Leaky ReLUs. We show in experiments that our method, which introduces negligible extra computational cost, validation accuracies with deep vanilla networks that are competitive with ResNets.
arXiv Detail & Related papers (2022-03-15T17:49:08Z)
ThresholdNet: Pruning Tool for Densely Connected Convolutional Networks [2.267411144256508]
We introduce a new type of pruning tool, threshold, which refers to the principle of the threshold voltage in terms of memory. This work employs this method to connect blocks of different depths in different ways to reduce the usage of memory. Experiments show that HarDNet is twice as fast as DenseNet, and on this basis, ThresholdNet is 10% faster and 10% lower error rate than HarDNet.
arXiv Detail & Related papers (2021-08-28T08:48:31Z)
Adder Neural Networks [75.54239599016535]
We present adder networks (AdderNets) to trade massive multiplications in deep neural networks. In AdderNets, we take the $ell_p$-norm distance between filters and input feature as the output response. We show that the proposed AdderNets can achieve 75.7% Top-1 accuracy 92.3% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
arXiv Detail & Related papers (2021-05-29T04:02:51Z)
Cascade Weight Shedding in Deep Neural Networks: Benefits and Pitfalls for Network Pruning [73.79377854107514]
We show that cascade weight shedding, when present, can significantly improve the performance of an otherwise sub-optimal scheme such as random pruning. We demonstrate cascade weight shedding's potential for improving GMP's accuracy, and reduce its computational complexity. We shed light on weight and learning-rate rewinding methods of re-training, showing their possible connections to the cascade weight shedding and reason for their advantage over fine-tuning.
arXiv Detail & Related papers (2021-03-19T04:41:40Z)
Deep Model Compression based on the Training History [13.916984628784768]
We propose a novel History Based Filter Pruning (HBFP) method that utilizes network training history for filter pruning. The proposed pruning method outperforms the state-of-the-art in terms of FLOPs reduction (floating-point operations) by 97.98%, 83.42%, 78.43%, and 74.95% for LeNet-5, VGG-16, ResNet-56, and ResNet-110 models, respectively.
arXiv Detail & Related papers (2021-01-30T06:04:21Z)
Model Rubik's Cube: Twisting Resolution, Depth and Width for TinyNets [65.28292822614418]
Giant formula for simultaneously enlarging the resolution, depth and width provides us a Rubik's cube for neural networks. This paper aims to explore the twisting rules for obtaining deep neural networks with minimum model sizes and computational costs.
arXiv Detail & Related papers (2020-10-28T08:49:45Z)
EfficientNet-eLite: Extremely Lightweight and Efficient CNN Models for Edge Devices by Network Candidate Search [13.467017642143583]
We propose a novel of Network Candidate Search (NCS) to study the trade-off between the resource usage and the performance. In our experiment, we collect candidate CNN models from EfficientNet-B0 to be scaled down in varied way through width, depth, input resolution and compound scaling down. For further embracing the CNN edge application with Application-Specific Integrated Circuit (ASIC), we adjust the architectures of EfficientNet-eLite to build the more hardware-friendly version, EfficientNet-HF.
arXiv Detail & Related papers (2020-09-16T01:11:10Z)
Add a SideNet to your MainNet [0.0]
We develop a method for adaptive network complexity by attaching a small classification layer, which we call SideNet, to a large pretrained network, which we call MainNet. Given an input, the SideNet returns a classification if its confidence level, obtained via softmax, surpasses a user determined threshold, and only passes it along to the large MainNet for further processing if its confidence is too low. Experimental results show that simple single hidden layer perceptron SideNets added onto pretrained ResNet and BERT MainNets allow for substantial decreases in compute with minimal drops in performance on image and text classification tasks.
arXiv Detail & Related papers (2020-07-14T19:25:32Z)
Pruning CNN's with linear filter ensembles [0.0]
We use pruning to reduce the network size and -- implicitly -- the number of floating point operations (FLOPs) We develop a novel filter importance norm that is based on the change in the empirical loss caused by the presence or removal of a component from the network architecture. We evaluate our method on a fully connected network, as well as on the ResNet architecture trained on the CIFAR-10 dataset.
arXiv Detail & Related papers (2020-01-22T16:52:06Z)
Filter Grafting for Deep Neural Networks [71.39169475500324]
Filter grafting aims to improve the representation capability of Deep Neural Networks (DNNs) We develop an entropy-based criterion to measure the information of filters and an adaptive weighting strategy for balancing the grafted information among networks. For example, the grafted MobileNetV2 outperforms the non-grafted MobileNetV2 by about 7 percent on CIFAR-100 dataset.
arXiv Detail & Related papers (2020-01-15T03:18:57Z)
AdderNet: Do We Really Need Multiplications in Deep Learning? [159.174891462064]
We present adder networks (AdderNets) to trade massive multiplications in deep neural networks for much cheaper additions to reduce computation costs. We develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset.
arXiv Detail & Related papers (2019-12-31T06:56:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.