RD-NAS: Enhancing One-shot Supernet Ranking Ability via Ranking
Distillation from Zero-cost Proxies
- URL: http://arxiv.org/abs/2301.09850v1
- Date: Tue, 24 Jan 2023 07:49:04 GMT
- Title: RD-NAS: Enhancing One-shot Supernet Ranking Ability via Ranking
Distillation from Zero-cost Proxies
- Authors: Peijie Dong, Xin Niu, Lujun Li, Zhiliang Tian, Xiaodong Wang, Zimian
Wei, Hengyue Pan, Dongsheng Li
- Abstract summary: We propose Ranking Distillation one-shot NAS (RD-NAS) to enhance ranking consistency.
Our evaluation of the NAS-Bench-201 and ResNet-based search space demonstrates that RD-NAS achieve 10.7% and 9.65% improvements in ranking ability.
- Score: 20.076610051602618
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural architecture search (NAS) has made tremendous progress in the
automatic design of effective neural network structures but suffers from a
heavy computational burden. One-shot NAS significantly alleviates the burden
through weight sharing and improves computational efficiency. Zero-shot NAS
further reduces the cost by predicting the performance of the network from its
initial state, which conducts no training. Both methods aim to distinguish
between "good" and "bad" architectures, i.e., ranking consistency of predicted
and true performance. In this paper, we propose Ranking Distillation one-shot
NAS (RD-NAS) to enhance ranking consistency, which utilizes zero-cost proxies
as the cheap teacher and adopts the margin ranking loss to distill the ranking
knowledge. Specifically, we propose a margin subnet sampler to distill the
ranking knowledge from zero-shot NAS to one-shot NAS by introducing Group
distance as margin. Our evaluation of the NAS-Bench-201 and ResNet-based search
space demonstrates that RD-NAS achieve 10.7\% and 9.65\% improvements in
ranking ability, respectively. Our codes are available at
https://github.com/pprp/CVPR2022-NAS-competition-Track1-3th-solution
Related papers
- SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss
Landscape [14.550053893504764]
We introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS.
In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up"
We present SiGeo, a proxy founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy.
arXiv Detail & Related papers (2023-11-22T05:25:24Z) - Zero-Shot Neural Architecture Search: Challenges, Solutions, and Opportunities [58.67514819895494]
Key idea behind zero-shot NAS approaches is to design proxies that can predict the accuracy of some given networks without training the network parameters.
This paper aims to comprehensively review and compare the state-of-the-art (SOTA) zero-shot NAS approaches.
arXiv Detail & Related papers (2023-07-05T03:07:00Z) - Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets [55.2118691522524]
Distillation-aware Neural Architecture Search (DaNAS) aims to search for an optimal student architecture.
We propose a distillation-aware meta accuracy prediction model, DaSS (Distillation-aware Student Search), which can predict a given architecture's final performances on a dataset.
arXiv Detail & Related papers (2023-05-26T14:00:35Z) - Generalization Properties of NAS under Activation and Skip Connection
Search [66.8386847112332]
We study the generalization properties of Neural Architecture Search (NAS) under a unifying framework.
We derive the lower (and upper) bounds of the minimum eigenvalue of the Neural Tangent Kernel (NTK) under the (in)finite-width regime.
We show how the derived results can guide NAS to select the top-performing architectures, even in the case without training.
arXiv Detail & Related papers (2022-09-15T12:11:41Z) - Improve Ranking Correlation of Super-net through Training Scheme from
One-shot NAS to Few-shot NAS [13.390484379343908]
We propose a step-by-step training super-net scheme from one-shot NAS to few-shot NAS.
In the training scheme, we firstly train super-net in a one-shot way, and then we disentangle the weights of super-net.
Our method ranks 4th place in the CVPR2022 3rd Lightweight NAS Challenge Track1.
arXiv Detail & Related papers (2022-06-13T04:02:12Z) - BaLeNAS: Differentiable Architecture Search via the Bayesian Learning
Rule [95.56873042777316]
Differentiable Architecture Search (DARTS) has received massive attention in recent years, mainly because it significantly reduces the computational cost.
This paper formulates the neural architecture search as a distribution learning problem through relaxing the architecture weights into Gaussian distributions.
We demonstrate how the differentiable NAS benefits from Bayesian principles, enhancing exploration and improving stability.
arXiv Detail & Related papers (2021-11-25T18:13:42Z) - How Does Supernet Help in Neural Architecture Search? [3.8348281160758027]
We conduct a comprehensive analysis on five search spaces, including NAS-Bench-101, NAS-Bench-201, DARTS-CIFAR10, DARTS-PTB, and ProxylessNAS.
We find that weight sharing works well on some search spaces but fails on others.
Our work is expected to inspire future NAS researchers to better leverage the power of weight sharing.
arXiv Detail & Related papers (2020-10-16T08:07:03Z) - Few-shot Neural Architecture Search [35.28010196935195]
We propose few-shot NAS that uses multiple supernetworks, called sub-supernets, each covering different regions of the search space to alleviate the undesired co-adaption.
With only up to 7 sub-supernets, few-shot NAS establishes new SoTAs: on ImageNet, it finds models that reach 80.5% top-1 accuracy at 600 MB FLOPS and 77.5% top-1 accuracy at 238 MFLOPS.
arXiv Detail & Related papers (2020-06-11T22:36:01Z) - DSNAS: Direct Neural Architecture Search without Parameter Retraining [112.02966105995641]
We propose a new problem definition for NAS, task-specific end-to-end, based on this observation.
We propose DSNAS, an efficient differentiable NAS framework that simultaneously optimize architecture and parameters with a low-biased Monte Carlo estimate.
DSNAS successfully discovers networks with comparable accuracy (74.4%) on ImageNet in 420 GPU hours, reducing the total time by more than 34%.
arXiv Detail & Related papers (2020-02-21T04:41:47Z) - NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture
Search [55.12928953187342]
We propose an extension to NAS-Bench-101: NAS-Bench-201 with a different search space, results on multiple datasets, and more diagnostic information.
NAS-Bench-201 has a fixed search space and provides a unified benchmark for almost any up-to-date NAS algorithms.
We provide additional diagnostic information such as fine-grained loss and accuracy, which can give inspirations to new designs of NAS algorithms.
arXiv Detail & Related papers (2020-01-02T05:28:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.