Efficient Few-Shot Neural Architecture Search by Counting the Number of Nonlinear Functions
- URL: http://arxiv.org/abs/2412.14678v1
- Date: Thu, 19 Dec 2024 09:31:53 GMT
- Title: Efficient Few-Shot Neural Architecture Search by Counting the Number of Nonlinear Functions
- Authors: Youngmin Oh, Hyunju Lee, Bumsub Ham,
- Abstract summary: We introduce a novel few-shot NAS method that exploits the number of nonlinear functions to split the search space.
Our method is efficient, since it does not require comparing gradients of a supernet to split the space.
In addition, we have found that dividing the space allows us to reduce the channel dimensions required for each supernet.
- Score: 29.76210308781724
- License:
- Abstract: Neural architecture search (NAS) enables finding the best-performing architecture from a search space automatically. Most NAS methods exploit an over-parameterized network (i.e., a supernet) containing all possible architectures (i.e., subnets) in the search space. However, the subnets that share the same set of parameters are likely to have different characteristics, interfering with each other during training. To address this, few-shot NAS methods have been proposed that divide the space into a few subspaces and employ a separate supernet for each subspace to limit the extent of weight sharing. They achieve state-of-the-art performance, but the computational cost increases accordingly. We introduce in this paper a novel few-shot NAS method that exploits the number of nonlinear functions to split the search space. To be specific, our method divides the space such that each subspace consists of subnets with the same number of nonlinear functions. Our splitting criterion is efficient, since it does not require comparing gradients of a supernet to split the space. In addition, we have found that dividing the space allows us to reduce the channel dimensions required for each supernet, which enables training multiple supernets in an efficient manner. We also introduce a supernet-balanced sampling (SBS) technique, sampling several subnets at each training step, to train different supernets evenly within a limited number of training steps. Extensive experiments on standard NAS benchmarks demonstrate the effectiveness of our approach. Our code is available at https://cvlab.yonsei.ac.kr/projects/EFS-NAS.
Related papers
- HEP-NAS: Towards Efficient Few-shot Neural Architecture Search via Hierarchical Edge Partitioning [8.484729345263153]
One-shot methods have advanced the field of neural architecture search (NAS) by adopting weight-sharing strategy to reduce search costs.
Few-shot methods divide the entire supernet into individual sub-supernets by splitting edge by edge to alleviate this issue.
We introduce HEP-NAS, a hierarchy-wise partition algorithm designed to further enhance accuracy.
arXiv Detail & Related papers (2024-12-14T07:42:56Z) - SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss
Landscape [14.550053893504764]
We introduce a "sub-one-shot" paradigm that serves as a bridge between zero-shot and one-shot NAS.
In sub-one-shot NAS, the supernet is trained using only a small subset of the training data, a phase we refer to as "warm-up"
We present SiGeo, a proxy founded on a novel theoretical framework that connects the supernet warm-up with the efficacy of the proxy.
arXiv Detail & Related papers (2023-11-22T05:25:24Z) - NAS-LID: Efficient Neural Architecture Search with Local Intrinsic
Dimension [37.04463309816036]
One-shot architecture search (NAS) substantially improves the search efficiency by training one supernet to estimate every possible child architecture.
Experiments on NASBench-201 indicate that NAS-LID achieves superior performance with better efficiency.
arXiv Detail & Related papers (2022-11-23T08:08:17Z) - Generalization Properties of NAS under Activation and Skip Connection
Search [66.8386847112332]
We study the generalization properties of Neural Architecture Search (NAS) under a unifying framework.
We derive the lower (and upper) bounds of the minimum eigenvalue of the Neural Tangent Kernel (NTK) under the (in)finite-width regime.
We show how the derived results can guide NAS to select the top-performing architectures, even in the case without training.
arXiv Detail & Related papers (2022-09-15T12:11:41Z) - Improve Ranking Correlation of Super-net through Training Scheme from
One-shot NAS to Few-shot NAS [13.390484379343908]
We propose a step-by-step training super-net scheme from one-shot NAS to few-shot NAS.
In the training scheme, we firstly train super-net in a one-shot way, and then we disentangle the weights of super-net.
Our method ranks 4th place in the CVPR2022 3rd Lightweight NAS Challenge Track1.
arXiv Detail & Related papers (2022-06-13T04:02:12Z) - Generalizing Few-Shot NAS with Gradient Matching [165.5690495295074]
One-Shot methods train one supernet to approximate the performance of every architecture in the search space via weight-sharing.
Few-Shot NAS reduces the level of weight-sharing by splitting the One-Shot supernet into multiple separated sub-supernets.
It significantly outperforms its Few-Shot counterparts while surpassing previous comparable methods in terms of the accuracy of derived architectures.
arXiv Detail & Related papers (2022-03-29T03:06:16Z) - An Analysis of Super-Net Heuristics in Weight-Sharing NAS [70.57382341642418]
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
arXiv Detail & Related papers (2021-10-04T02:18:44Z) - Pi-NAS: Improving Neural Architecture Search by Reducing Supernet
Training Consistency Shift [128.32670289503025]
Recently proposed neural architecture search (NAS) methods co-train billions of architectures in a supernet and estimate their potential accuracy.
The ranking correlation between the architectures' predicted accuracy and their actual capability is incorrect, which causes the existing NAS methods' dilemma.
We attribute this ranking correlation problem to the supernet training consistency shift, including feature shift and parameter shift.
We address these two shifts simultaneously using a nontrivial supernet-Pi model, called Pi-NAS.
arXiv Detail & Related papers (2021-08-22T09:08:48Z) - GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet [63.96959854429752]
GreedyNAS is easy-to-follow, and experimental results on ImageNet dataset indicate that it can achieve better Top-1 accuracy under same search space and FLOPs or latency level.
By searching on a larger space, our GreedyNAS can also obtain new state-of-the-art architectures.
arXiv Detail & Related papers (2020-03-25T06:54:10Z) - NAS-Bench-201: Extending the Scope of Reproducible Neural Architecture
Search [55.12928953187342]
We propose an extension to NAS-Bench-101: NAS-Bench-201 with a different search space, results on multiple datasets, and more diagnostic information.
NAS-Bench-201 has a fixed search space and provides a unified benchmark for almost any up-to-date NAS algorithms.
We provide additional diagnostic information such as fine-grained loss and accuracy, which can give inspirations to new designs of NAS algorithms.
arXiv Detail & Related papers (2020-01-02T05:28:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.