How Does Supernet Help in Neural Architecture Search?
- URL: http://arxiv.org/abs/2010.08219v2
- Date: Wed, 5 May 2021 07:26:48 GMT
- Title: How Does Supernet Help in Neural Architecture Search?
- Authors: Yuge Zhang, Quanlu Zhang, Yaming Yang
- Abstract summary: We conduct a comprehensive analysis on five search spaces, including NAS-Bench-101, NAS-Bench-201, DARTS-CIFAR10, DARTS-PTB, and ProxylessNAS.
We find that weight sharing works well on some search spaces but fails on others.
Our work is expected to inspire future NAS researchers to better leverage the power of weight sharing.
- Score: 3.8348281160758027
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Weight sharing, as an approach to speed up architecture performance
estimation has received wide attention. Instead of training each architecture
separately, weight sharing builds a supernet that assembles all the
architectures as its submodels. However, there has been debate over whether the
NAS process actually benefits from weight sharing, due to the gap between
supernet optimization and the objective of NAS. To further understand the
effect of weight sharing on NAS, we conduct a comprehensive analysis on five
search spaces, including NAS-Bench-101, NAS-Bench-201, DARTS-CIFAR10,
DARTS-PTB, and ProxylessNAS. We find that weight sharing works well on some
search spaces but fails on others. Taking a step forward, we further identified
biases accounting for such phenomenon and the capacity of weight sharing. Our
work is expected to inspire future NAS researchers to better leverage the power
of weight sharing.
Related papers
- DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques.
Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms.
Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z) - SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss
Landscape [14.550053893504764]
We introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS.
In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up"
We present SiGeo, a proxy founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy.
arXiv Detail & Related papers (2023-11-22T05:25:24Z) - Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts [55.470959564665705]
Weight-sharing supernets are crucial for performance estimation in cutting-edge neural search frameworks.
The proposed method attains state-of-the-art (SoTA) performance in NAS for fast machine translation models.
It excels in NAS for building memory-efficient task-agnostic BERT models.
arXiv Detail & Related papers (2023-06-08T00:35:36Z) - RD-NAS: Enhancing One-shot Supernet Ranking Ability via Ranking
Distillation from Zero-cost Proxies [20.076610051602618]
We propose Ranking Distillation one-shot NAS (RD-NAS) to enhance ranking consistency.
Our evaluation of the NAS-Bench-201 and ResNet-based search space demonstrates that RD-NAS achieve 10.7% and 9.65% improvements in ranking ability.
arXiv Detail & Related papers (2023-01-24T07:49:04Z) - Generalization Properties of NAS under Activation and Skip Connection
Search [66.8386847112332]
We study the generalization properties of Neural Architecture Search (NAS) under a unifying framework.
We derive the lower (and upper) bounds of the minimum eigenvalue of the Neural Tangent Kernel (NTK) under the (in)finite-width regime.
We show how the derived results can guide NAS to select the top-performing architectures, even in the case without training.
arXiv Detail & Related papers (2022-09-15T12:11:41Z) - Improve Ranking Correlation of Super-net through Training Scheme from
One-shot NAS to Few-shot NAS [13.390484379343908]
We propose a step-by-step training super-net scheme from one-shot NAS to few-shot NAS.
In the training scheme, we firstly train super-net in a one-shot way, and then we disentangle the weights of super-net.
Our method ranks 4th place in the CVPR2022 3rd Lightweight NAS Challenge Track1.
arXiv Detail & Related papers (2022-06-13T04:02:12Z) - BaLeNAS: Differentiable Architecture Search via the Bayesian Learning
Rule [95.56873042777316]
Differentiable Architecture Search (DARTS) has received massive attention in recent years, mainly because it significantly reduces the computational cost.
This paper formulates the neural architecture search as a distribution learning problem through relaxing the architecture weights into Gaussian distributions.
We demonstrate how the differentiable NAS benefits from Bayesian principles, enhancing exploration and improving stability.
arXiv Detail & Related papers (2021-11-25T18:13:42Z) - An Analysis of Super-Net Heuristics in Weight-Sharing NAS [70.57382341642418]
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
arXiv Detail & Related papers (2021-10-04T02:18:44Z) - K-shot NAS: Learnable Weight-Sharing for NAS with K-shot Supernets [52.983810997539486]
We introduce $K$-shot supernets and take their weights for each operation as a dictionary.
A textitsimplex-net is introduced to produce architecture-customized code for each path.
Experiments on benchmark datasets validate that K-shot NAS significantly improves the evaluation accuracy of paths.
arXiv Detail & Related papers (2021-06-11T14:57:36Z) - AlphaNet: Improved Training of Supernet with Alpha-Divergence [28.171262066145616]
We propose to improve the supernet training with a more generalized alpha-divergence.
We apply the proposed alpha-divergence based supernet training to both slimmable neural networks and weight-sharing NAS.
Specifically, our discovered model family, AlphaNet, outperforms prior-art models on a wide range of FLOPs regimes.
arXiv Detail & Related papers (2021-02-16T04:23:55Z) - Few-shot Neural Architecture Search [35.28010196935195]
We propose few-shot NAS that uses multiple supernetworks, called sub-supernets, each covering different regions of the search space to alleviate the undesired co-adaption.
With only up to 7 sub-supernets, few-shot NAS establishes new SoTAs: on ImageNet, it finds models that reach 80.5% top-1 accuracy at 600 MB FLOPS and 77.5% top-1 accuracy at 238 MFLOPS.
arXiv Detail & Related papers (2020-06-11T22:36:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.