Compact Multi-level Sparse Neural Networks with Input Independent
Dynamic Rerouting
- URL: http://arxiv.org/abs/2112.10930v1
- Date: Tue, 21 Dec 2021 01:35:51 GMT
- Title: Compact Multi-level Sparse Neural Networks with Input Independent
Dynamic Rerouting
- Authors: Minghai Qin, Tianyun Zhang, Fei Sun, Yen-Kuang Chen, Makan Fardad,
Yanzhi Wang, Yuan Xie
- Abstract summary: Sparse deep neural networks can substantially reduce the complexity and memory consumption of the models.
Facing the real-life challenges, we propose to train a sparse model that supports multiple sparse levels.
In this way, one can dynamically select the appropriate sparsity level during inference, while the storage cost is capped by the least sparse sub-model.
- Score: 33.35713740886292
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks (DNNs) have shown to provide superb performance in many
real life applications, but their large computation cost and storage
requirement have prevented them from being deployed to many edge and
internet-of-things (IoT) devices. Sparse deep neural networks, whose majority
weight parameters are zeros, can substantially reduce the computation
complexity and memory consumption of the models. In real-use scenarios, devices
may suffer from large fluctuations of the available computation and memory
resources under different environment, and the quality of service (QoS) is
difficult to maintain due to the long tail inferences with large latency.
Facing the real-life challenges, we propose to train a sparse model that
supports multiple sparse levels. That is, a hierarchical structure of weights
are satisfied such that the locations and the values of the non-zero parameters
of the more-sparse sub-model area subset of the less-sparse sub-model. In this
way, one can dynamically select the appropriate sparsity level during
inference, while the storage cost is capped by the least sparse sub-model. We
have verified our methodologies on a variety of DNN models and tasks, including
the ResNet-50, PointNet++, GNMT, and graph attention networks. We obtain sparse
sub-models with an average of 13.38% weights and 14.97% FLOPs, while the
accuracies are as good as their dense counterparts. More-sparse sub-models with
5.38% weights and 4.47% of FLOPs, which are subsets of the less-sparse ones,
can be obtained with only 3.25% relative accuracy loss.
Related papers
- Just How Flexible are Neural Networks in Practice? [89.80474583606242]
It is widely believed that a neural network can fit a training set containing at least as many samples as it has parameters.
In practice, however, we only find solutions via our training procedure, including the gradient and regularizers, limiting flexibility.
arXiv Detail & Related papers (2024-06-17T12:24:45Z) - Neural Networks at a Fraction with Pruned Quaternions [0.0]
Pruning is one technique to remove unnecessary weights and reduce resource requirements for training and inference.
For ML tasks where the input data is multi-dimensional, using higher-dimensional data embeddings such as complex numbers or quaternions has been shown to reduce the parameter count while maintaining accuracy.
We find that for some architectures, at very high sparsity levels, quaternion models provide higher accuracies than their real counterparts.
arXiv Detail & Related papers (2023-08-13T14:25:54Z) - LilNetX: Lightweight Networks with EXtreme Model Compression and
Structured Sparsification [36.651329027209634]
LilNetX is an end-to-end trainable technique for neural networks.
It enables learning models with specified accuracy-rate-computation trade-off.
arXiv Detail & Related papers (2022-04-06T17:59:10Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Compact representations of convolutional neural networks via weight
pruning and quantization [63.417651529192014]
We propose a novel storage format for convolutional neural networks (CNNs) based on source coding and leveraging both weight pruning and quantization.
We achieve a reduction of space occupancy up to 0.6% on fully connected layers and 5.44% on the whole network, while performing at least as competitive as the baseline.
arXiv Detail & Related papers (2021-08-28T20:39:54Z) - ANNETTE: Accurate Neural Network Execution Time Estimation with Stacked
Models [56.21470608621633]
We propose a time estimation framework to decouple the architectural search from the target hardware.
The proposed methodology extracts a set of models from micro- kernel and multi-layer benchmarks and generates a stacked model for mapping and network execution time estimation.
We compare estimation accuracy and fidelity of the generated mixed models, statistical models with the roofline model, and a refined roofline model for evaluation.
arXiv Detail & Related papers (2021-05-07T11:39:05Z) - NL-CNN: A Resources-Constrained Deep Learning Model based on Nonlinear
Convolution [0.0]
A novel convolution neural network model, abbreviated NL-CNN, is proposed, where nonlinear convolution is emulated in a cascade of convolution + nonlinearity layers.
Performance evaluation for several widely known datasets is provided, showing several relevant features.
arXiv Detail & Related papers (2021-01-30T13:38:42Z) - Fully Dynamic Inference with Deep Neural Networks [19.833242253397206]
Two compact networks, called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance basis which layers or filters/channels are redundant and therefore should be skipped.
On the CIFAR-10 dataset, LC-Net results in up to 11.9$times$ fewer floating-point operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic inference methods.
On the ImageNet dataset, LC-Net achieves up to 1.4$times$ fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods.
arXiv Detail & Related papers (2020-07-29T23:17:48Z) - When Residual Learning Meets Dense Aggregation: Rethinking the
Aggregation of Deep Neural Networks [57.0502745301132]
We propose Micro-Dense Nets, a novel architecture with global residual learning and local micro-dense aggregations.
Our micro-dense block can be integrated with neural architecture search based models to boost their performance.
arXiv Detail & Related papers (2020-04-19T08:34:52Z) - Widening and Squeezing: Towards Accurate and Efficient QNNs [125.172220129257]
Quantization neural networks (QNNs) are very attractive to the industry because their extremely cheap calculation and storage overhead, but their performance is still worse than that of networks with full-precision parameters.
Most of existing methods aim to enhance performance of QNNs especially binary neural networks by exploiting more effective training techniques.
We address this problem by projecting features in original full-precision networks to high-dimensional quantization features.
arXiv Detail & Related papers (2020-02-03T04:11:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.