Locally Free Weight Sharing for Network Width Search
- URL: http://arxiv.org/abs/2102.05258v1
- Date: Wed, 10 Feb 2021 04:36:09 GMT
- Title: Locally Free Weight Sharing for Network Width Search
- Authors: Xiu Su, Shan You, Tao Huang, Fei Wang, Chen Qian, Changshui Zhang,
Chang Xu
- Abstract summary: Searching for network width is an effective way to slim deep neural networks with hardware budgets.
We propose a locally free weight sharing strategy (CafeNet) to better evaluate each width.
Our method can further boost the benchmark NAS network EfficientNet-B0 by 0.41% via searching its width more delicately.
- Score: 55.155969155967284
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Searching for network width is an effective way to slim deep neural networks
with hardware budgets. With this aim, a one-shot supernet is usually leveraged
as a performance evaluator to rank the performance \wrt~different width.
Nevertheless, current methods mainly follow a manually fixed weight sharing
pattern, which is limited to distinguish the performance gap of different
width. In this paper, to better evaluate each width, we propose a locally free
weight sharing strategy (CafeNet) accordingly. In CafeNet, weights are more
freely shared, and each width is jointly indicated by its base channels and
free channels, where free channels are supposed to locate freely in a local
zone to better represent each width. Besides, we propose to further reduce the
search space by leveraging our introduced FLOPs-sensitive bins. As a result,
our CafeNet can be trained stochastically and get optimized within a min-min
strategy. Extensive experiments on ImageNet, CIFAR-10, CelebA and MS COCO
dataset have verified our superiority comparing to other state-of-the-art
baselines. For example, our method can further boost the benchmark NAS network
EfficientNet-B0 by 0.41\% via searching its width more delicately.
Related papers
- Slimmable Pruned Neural Networks [1.8275108630751844]
The accuracy of each sub-network on S-Net is inferior to that of individually trained networks of the same size.
We propose Slimmable Pruned Neural Networks (SP-Net) which has sub-network structures learned by pruning.
SP-Net can be combined with any kind of channel pruning methods and does not require any complicated processing or time-consuming architecture search like NAS models.
arXiv Detail & Related papers (2022-12-07T02:54:15Z) - Generalizing Few-Shot NAS with Gradient Matching [165.5690495295074]
One-Shot methods train one supernet to approximate the performance of every architecture in the search space via weight-sharing.
Few-Shot NAS reduces the level of weight-sharing by splitting the One-Shot supernet into multiple separated sub-supernets.
It significantly outperforms its Few-Shot counterparts while surpassing previous comparable methods in terms of the accuracy of derived architectures.
arXiv Detail & Related papers (2022-03-29T03:06:16Z) - Searching for Network Width with Bilaterally Coupled Network [75.43658047510334]
We introduce a new supernet called Bilaterally Coupled Network (BCNet) to address this issue.
In BCNet, each channel is fairly trained and responsible for the same amount of network widths, thus each network width can be evaluated more accurately.
We propose the first open-source width benchmark on macro structures named Channel-Bench-Macro for the better comparison of width search algorithms.
arXiv Detail & Related papers (2022-03-25T15:32:46Z) - BCNet: Searching for Network Width with Bilaterally Coupled Network [56.14248440683152]
We introduce a new supernet called Bilaterally Coupled Network (BCNet) to address this issue.
In BCNet, each channel is fairly trained and responsible for the same amount of network widths, thus each network width can be evaluated more accurately.
Our method achieves state-of-the-art or competing performance over other baseline methods.
arXiv Detail & Related papers (2021-05-21T18:54:03Z) - Lite-HRNet: A Lightweight High-Resolution Network [97.17242913089464]
We present an efficient high-resolution network, Lite-HRNet, for human pose estimation.
We find that heavily-used pointwise (1x1) convolutions in shuffle blocks become the computational bottleneck.
We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks.
arXiv Detail & Related papers (2021-04-13T17:59:31Z) - WeightNet: Revisiting the Design Space of Weight Networks [96.48596945711562]
We present a conceptually simple, flexible and effective framework for weight generating networks.
Our approach is general that unifies two current distinct and extremely effective SENet and CondConv into the same framework on weight space.
arXiv Detail & Related papers (2020-07-23T06:49:01Z) - Joslim: Joint Widths and Weights Optimization for Slimmable Neural
Networks [37.09353669633368]
We propose a general framework to enable joint optimization for both width configurations and weights of slimmable networks.
Our framework subsumes conventional and NAS-based slimmable methods as special cases and provides flexibility to improve over existing methods.
Improvements up to 1.7% and 8% in top-1 accuracy on the ImageNet dataset can be attained for MobileNetV2 considering FLOPs and memory footprint.
arXiv Detail & Related papers (2020-07-23T02:05:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.