Fully Dynamic Inference with Deep Neural Networks
- URL: http://arxiv.org/abs/2007.15151v1
- Date: Wed, 29 Jul 2020 23:17:48 GMT
- Title: Fully Dynamic Inference with Deep Neural Networks
- Authors: Wenhan Xia, Hongxu Yin, Xiaoliang Dai, Niraj K. Jha
- Abstract summary: Two compact networks, called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance basis which layers or filters/channels are redundant and therefore should be skipped.
On the CIFAR-10 dataset, LC-Net results in up to 11.9$times$ fewer floating-point operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic inference methods.
On the ImageNet dataset, LC-Net achieves up to 1.4$times$ fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods.
- Score: 19.833242253397206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Modern deep neural networks are powerful and widely applicable models that
extract task-relevant information through multi-level abstraction. Their
cross-domain success, however, is often achieved at the expense of
computational cost, high memory bandwidth, and long inference latency, which
prevents their deployment in resource-constrained and time-sensitive scenarios,
such as edge-side inference and self-driving cars. While recently developed
methods for creating efficient deep neural networks are making their real-world
deployment more feasible by reducing model size, they do not fully exploit
input properties on a per-instance basis to maximize computational efficiency
and task accuracy. In particular, most existing methods typically use a
one-size-fits-all approach that identically processes all inputs. Motivated by
the fact that different images require different feature embeddings to be
accurately classified, we propose a fully dynamic paradigm that imparts deep
convolutional neural networks with hierarchical inference dynamics at the level
of layers and individual convolutional filters/channels. Two compact networks,
called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance
basis which layers or filters/channels are redundant and therefore should be
skipped. L-Net and C-Net also learn how to scale retained computation outputs
to maximize task accuracy. By integrating L-Net and C-Net into a joint design
framework, called LC-Net, we consistently outperform state-of-the-art dynamic
frameworks with respect to both efficiency and classification accuracy. On the
CIFAR-10 dataset, LC-Net results in up to 11.9$\times$ fewer floating-point
operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic
inference methods. On the ImageNet dataset, LC-Net achieves up to 1.4$\times$
fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods.
Related papers
- Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - A Generalization of Continuous Relaxation in Structured Pruning [0.3277163122167434]
Trends indicate that deeper and larger neural networks with an increasing number of parameters achieve higher accuracy than smaller neural networks.
We generalize structured pruning with algorithms for network augmentation, pruning, sub-network collapse and removal.
The resulting CNN executes efficiently on GPU hardware without computationally expensive sparse matrix operations.
arXiv Detail & Related papers (2023-08-28T14:19:13Z) - Lightweight and Progressively-Scalable Networks for Semantic
Segmentation [100.63114424262234]
Multi-scale learning frameworks have been regarded as a capable class of models to boost semantic segmentation.
In this paper, we thoroughly analyze the design of convolutional blocks and the ways of interactions across multiple scales.
We devise Lightweight and Progressively-Scalable Networks (LPS-Net) that novelly expands the network complexity in a greedy manner.
arXiv Detail & Related papers (2022-07-27T16:00:28Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Manifold Regularized Dynamic Network Pruning [102.24146031250034]
This paper proposes a new paradigm that dynamically removes redundant filters by embedding the manifold information of all instances into the space of pruned networks.
The effectiveness of the proposed method is verified on several benchmarks, which shows better performance in terms of both accuracy and computational cost.
arXiv Detail & Related papers (2021-03-10T03:59:03Z) - ES-Net: An Efficient Stereo Matching Network [4.8986598953553555]
Existing stereo matching networks typically use slow and computationally expensive 3D convolutions to improve the performance.
We propose the Efficient Stereo Network (ESNet), which achieves high performance and efficient inference at the same time.
arXiv Detail & Related papers (2021-03-05T20:11:39Z) - Solving Mixed Integer Programs Using Neural Networks [57.683491412480635]
This paper applies learning to the two key sub-tasks of a MIP solver, generating a high-quality joint variable assignment, and bounding the gap in objective value between that assignment and an optimal one.
Our approach constructs two corresponding neural network-based components, Neural Diving and Neural Branching, to use in a base MIP solver such as SCIP.
We evaluate our approach on six diverse real-world datasets, including two Google production datasets and MIPLIB, by training separate neural networks on each.
arXiv Detail & Related papers (2020-12-23T09:33:11Z) - A Progressive Sub-Network Searching Framework for Dynamic Inference [33.93841415140311]
We propose a progressive sub-net searching framework, which is embedded with several effective techniques, including trainable noise ranking, channel group and fine-tuning threshold setting, sub-nets re-selection.
Our proposed method achieves much better dynamic inference accuracy compared with prior popular Universally-Slimmable-Network by 4.4%-maximally and 2.3%-averagely in ImageNet dataset with the same model size.
arXiv Detail & Related papers (2020-09-11T22:56:02Z) - Towards Lossless Binary Convolutional Neural Networks Using Piecewise
Approximation [4.023728681102073]
CNNs can significantly reduce the number of arithmetic operations and the size of memory storage.
However, the accuracy degradation of single and multiple binary CNNs is unacceptable for modern architectures.
We propose a Piecewise Approximation scheme for multiple binary CNNs which lessens accuracy loss by approximating full precision weights and activations.
arXiv Detail & Related papers (2020-08-08T13:32:33Z) - ReActNet: Towards Precise Binary Neural Network with Generalized
Activation Functions [76.05981545084738]
We propose several ideas for enhancing a binary network to close its accuracy gap from real-valued networks without incurring any additional computational cost.
We first construct a baseline network by modifying and binarizing a compact real-valued network with parameter-free shortcuts.
We show that the proposed ReActNet outperforms all the state-of-the-arts by a large margin.
arXiv Detail & Related papers (2020-03-07T02:12:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.