PAD-Net: An Efficient Framework for Dynamic Networks
- URL: http://arxiv.org/abs/2211.05528v4
- Date: Wed, 31 May 2023 09:27:56 GMT
- Title: PAD-Net: An Efficient Framework for Dynamic Networks
- Authors: Shwai He, Liang Ding, Daize Dong, Boan Liu, Fuqiang Yu, Dacheng Tao
- Abstract summary: Common practice in implementing dynamic networks is to convert the given static layers into fully dynamic ones.
We propose a partially dynamic network, namely PAD-Net, to transform the redundant dynamic parameters into static ones.
Our method is comprehensively supported by large-scale experiments with two typical advanced dynamic architectures.
- Score: 72.85480289152719
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Dynamic networks, e.g., Dynamic Convolution (DY-Conv) and the Mixture of
Experts (MoE), have been extensively explored as they can considerably improve
the model's representation power with acceptable computational cost. The common
practice in implementing dynamic networks is to convert the given static layers
into fully dynamic ones where all parameters are dynamic (at least within a
single layer) and vary with the input. However, such a fully dynamic setting
may cause redundant parameters and high deployment costs, limiting the
applicability of dynamic networks to a broader range of tasks and models. The
main contributions of our work are challenging the basic commonsense in dynamic
networks and proposing a partially dynamic network, namely PAD-Net, to
transform the redundant dynamic parameters into static ones. Also, we further
design Iterative Mode Partition to partition dynamic and static parameters
efficiently. Our method is comprehensively supported by large-scale experiments
with two typical advanced dynamic architectures, i.e., DY-Conv and MoE, on both
image classification and GLUE benchmarks. Encouragingly, we surpass the fully
dynamic networks by $+0.7\%$ top-1 acc with only $30\%$ dynamic parameters for
ResNet-50 and $+1.9\%$ average score in language understanding with only $50\%$
dynamic parameters for BERT. Code will be released at:
\url{https://github.com/Shwai-He/PAD-Net}.
Related papers
- Dynamic Mobile-Former: Strengthening Dynamic Convolution with Attention
and Residual Connection in Kernel Space [4.111899441919165]
Dynamic Mobile-Former maximizes the capabilities of dynamic convolution by harmonizing it with efficient operators.
PVT.A Transformer in Dynamic Mobile-Former only requires a few randomly calculate global features.
Bridge between Dynamic MobileNet and Transformer allows for bidirectional integration of local and global features.
arXiv Detail & Related papers (2023-04-13T05:22:24Z) - DynInt: Dynamic Interaction Modeling for Large-scale Click-Through Rate
Prediction [0.0]
Learning feature interactions is the key to success for the large-scale CTR prediction in Ads ranking and recommender systems.
Deep neural network-based models are widely adopted for modeling such problems.
We propose a new model: DynInt, which learns higher-order interactions to be dynamic and data-dependent.
arXiv Detail & Related papers (2023-01-03T13:01:30Z) - SD-Conv: Towards the Parameter-Efficiency of Dynamic Convolution [16.56592303409295]
Dynamic convolution achieves better performance for efficient CNNs at the cost of negligible FLOPs increase.
We propose a new framework, textbfSparse Dynamic Convolution (textscSD-Conv), to naturally integrate these two paths.
arXiv Detail & Related papers (2022-04-05T14:03:54Z) - DyTox: Transformers for Continual Learning with DYnamic TOken eXpansion [89.92242000948026]
We propose a transformer architecture based on a dedicated encoder/decoder framework.
Through a dynamic expansion of special tokens, we specialize each forward of our decoder network on a task distribution.
Our strategy scales to a large number of tasks while having negligible memory and time overheads.
arXiv Detail & Related papers (2021-11-22T16:29:06Z) - DS-Net++: Dynamic Weight Slicing for Efficient Inference in CNNs and
Transformers [105.74546828182834]
We show a hardware-efficient dynamic inference regime, named dynamic weight slicing, which adaptively slice a part of network parameters for inputs with diverse difficulty levels.
We present dynamic slimmable network (DS-Net) and dynamic slice-able network (DS-Net++) by input-dependently adjusting filter numbers of CNNs and multiple dimensions in both CNNs and transformers.
arXiv Detail & Related papers (2021-09-21T09:57:21Z) - Dynamic Slimmable Network [105.74546828182834]
We develop a dynamic network slimming regime named Dynamic Slimmable Network (DS-Net)
Our DS-Net is empowered with the ability of dynamic inference by the proposed double-headed dynamic gate.
It consistently outperforms its static counterparts as well as state-of-the-art static and dynamic model compression methods.
arXiv Detail & Related papers (2021-03-24T15:25:20Z) - Learning to Generate Content-Aware Dynamic Detectors [62.74209921174237]
We introduce a newpective of designing efficient detectors, which is automatically generating sample-adaptive model architecture.
We introduce a course-to-fine strat-egy tailored for object detection to guide the learning of dynamic routing.
Experiments on MS-COCO dataset demonstrate that CADDet achieves 1.8 higher mAP with 10% fewer FLOPs compared with vanilla routing.
arXiv Detail & Related papers (2020-12-08T08:05:20Z) - Learning Dynamic Routing for Semantic Segmentation [86.56049245100084]
This paper studies a conceptually new method to alleviate the scale variance in semantic representation, named dynamic routing.
The proposed framework generates data-dependent routes, adapting to the scale distribution of each image.
To this end, a differentiable gating function, called soft conditional gate, is proposed to select scale transform paths on the fly.
arXiv Detail & Related papers (2020-03-23T17:22:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.