DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers
- URL: http://arxiv.org/abs/2109.10060v1
- Date: Tue, 21 Sep 2021 09:57:21 GMT
- Title: DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers
- Authors: Changlin Li, Guangrun Wang, Bing Wang, Xiaodan Liang, Zhihui Li and
Xiaojun Chang
- Abstract summary: We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
- Score: 105.74546828182834
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Dynamic networks have shown their promising capability in reducing
theoretical computation complexity by adapting their architectures to the input
during inference. However, their practical runtime usually lags behind the
theoretical acceleration due to inefficient sparsity. Here, we explore a
hardware-efficient dynamic inference regime, named dynamic weight slicing,
which adaptively slice a part of network parameters for inputs with diverse
difficulty levels, while keeping parameters stored statically and contiguously
in hardware to prevent the extra burden of sparse computation. Based on this
scheme, we present dynamic slimmable network (DS-Net) and dynamic slice-able
network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and
multiple dimensions in both CNNs and transformers, respectively. To ensure
sub-network generality and routing fairness, we propose a disentangled
two-stage optimization scheme with training techniques such as in-place
bootstrapping (IB), multi-view consistency (MvCo) and sandwich gate
sparsification (SGS) to train supernet and gate separately. Extensive
experiments on 4 datasets and 3 different network architectures demonstrate our
method consistently outperforms state-of-the-art static and dynamic model
compression methods by a large margin (up to 6.6%). Typically, DS-Net++
achieves 2-4x computation reduction and 1.62x real-world acceleration over
MobileNet, ResNet-50 and Vision Transformer, with minimal accuracy drops
(0.1-0.3%) on ImageNet. Code release: https://github.com/changlin31/DS-Net
Related papers
- Latency-aware Unified Dynamic Networks for Efficient Image Recognition [72.8951331472913]
LAUDNet is a framework to bridge the theoretical and practical efficiency gap in dynamic networks.
It integrates three primary dynamic paradigms-spatially adaptive computation, dynamic layer skipping, and dynamic channel skipping.
It can notably reduce the latency of models like ResNet by over 50% on platforms such as V100,3090, and TX2 GPUs.
arXiv Detail & Related papers (2023-08-30T10:57:41Z) - Lightweight and Progressively-Scalable Networks for Semantic
Segmentation [100.63114424262234]
Multi-scale learning frameworks have been regarded as a capable class of models to boost semantic segmentation.
In this paper, we thoroughly analyze the design of convolutional blocks and the ways of interactions across multiple scales.
We devise Lightweight and Progressively-Scalable Networks (LPS-Net) that novelly expands the network complexity in a greedy manner.
arXiv Detail & Related papers (2022-07-27T16:00:28Z) - An Adaptive Device-Edge Co-Inference Framework Based on Soft
Actor-Critic [72.35307086274912]
High-dimension parameter model and large-scale mathematical calculation restrict execution efficiency, especially for Internet of Things (IoT) devices.
We propose a new Deep Reinforcement Learning (DRL)-Soft Actor Critic for discrete (SAC-d), which generates the emphexit point, emphexit point, and emphcompressing bits by soft policy iterations.
Based on the latency and accuracy aware reward design, such an computation can well adapt to the complex environment like dynamic wireless channel and arbitrary processing, and is capable of supporting the 5G URL
arXiv Detail & Related papers (2022-01-09T09:31:50Z) - Compact Multi-level Sparse Neural Networks with Input Independent
Dynamic Rerouting [33.35713740886292]
Sparse deep neural networks can substantially reduce the complexity and memory consumption of the models.
Facing the real-life challenges, we propose to train a sparse model that supports multiple sparse levels.
In this way, one can dynamically select the appropriate sparsity level during inference, while the storage cost is capped by the least sparse sub-model.
arXiv Detail & Related papers (2021-12-21T01:35:51Z) - Dynamic Convolution for 3D Point Cloud Instance Segmentation [146.7971476424351]
We propose an approach to instance segmentation from 3D point clouds based on dynamic convolution.
We gather homogeneous points that have identical semantic categories and close votes for the geometric centroids.
The proposed approach is proposal-free, and instead exploits a convolution process that adapts to the spatial and semantic characteristics of each instance.
arXiv Detail & Related papers (2021-07-18T09:05:16Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z) - Towards Lossless Binary Convolutional Neural Networks Using Piecewise
Approximation [4.023728681102073]
CNNs can significantly reduce the number of arithmetic operations and the size of memory storage.
However, the accuracy degradation of single and multiple binary CNNs is unacceptable for modern architectures.
We propose a Piecewise Approximation scheme for multiple binary CNNs which lessens accuracy loss by approximating full precision weights and activations.
arXiv Detail & Related papers (2020-08-08T13:32:33Z) - Fully Dynamic Inference with Deep Neural Networks [19.833242253397206]
Two compact networks, called Layer-Net (L-Net) and Channel-Net (C-Net), predict on a per-instance basis which layers or filters/channels are redundant and therefore should be skipped.
On the CIFAR-10 dataset, LC-Net results in up to 11.9$times$ fewer floating-point operations (FLOPs) and up to 3.3% higher accuracy compared to other dynamic inference methods.
On the ImageNet dataset, LC-Net achieves up to 1.4$times$ fewer FLOPs and up to 4.6% higher Top-1 accuracy than the other methods.
arXiv Detail & Related papers (2020-07-29T23:17:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.