Subnet-Aware Dynamic Supernet Training for Neural Architecture Search
- URL: http://arxiv.org/abs/2503.10740v1
- Date: Thu, 13 Mar 2025 17:07:04 GMT
- Title: Subnet-Aware Dynamic Supernet Training for Neural Architecture Search
- Authors: Jeimin Jeon, Youngmin Oh, Junghyup Lee, Donghyeon Baek, Dohyung Kim, Chanho Eom, Bumsub Ham,
- Abstract summary: N-shot architecture search (NAS) exploits a supernet containing all candidates for a given search space.<n>Supernet training is biased towards the low-complexitys (unfairness)<n>We present a dynamic supernet training technique to address these problems by adjusting the training strategy adaptive to the complexitys.
- Score: 34.085718250054136
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: N-shot neural architecture search (NAS) exploits a supernet containing all candidate subnets for a given search space. The subnets are typically trained with a static training strategy (e.g., using the same learning rate (LR) scheduler and optimizer for all subnets). This, however, does not consider that individual subnets have distinct characteristics, leading to two problems: (1) The supernet training is biased towards the low-complexity subnets (unfairness); (2) the momentum update in the supernet is noisy (noisy momentum). We present a dynamic supernet training technique to address these problems by adjusting the training strategy adaptive to the subnets. Specifically, we introduce a complexity-aware LR scheduler (CaLR) that controls the decay ratio of LR adaptive to the complexities of subnets, which alleviates the unfairness problem. We also present a momentum separation technique (MS). It groups the subnets with similar structural characteristics and uses a separate momentum for each group, avoiding the noisy momentum problem. Our approach can be applicable to various N-shot NAS methods with marginal cost, while improving the search performance drastically. We validate the effectiveness of our approach on various search spaces (e.g., NAS-Bench-201, Mobilenet spaces) and datasets (e.g., CIFAR-10/100, ImageNet).
Related papers
- Efficient Few-Shot Neural Architecture Search by Counting the Number of Nonlinear Functions [29.76210308781724]
We introduce a novel few-shot NAS method that exploits the number of nonlinear functions to split the search space.<n>Our method is efficient, since it does not require comparing gradients of a supernet to split the space.<n>In addition, we have found that dividing the space allows us to reduce the channel dimensions required for each supernet.
arXiv Detail & Related papers (2024-12-19T09:31:53Z) - Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs.
We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z) - Boosting Residual Networks with Group Knowledge [75.73793561417702]
Recent research understands the residual networks from a new perspective of the implicit ensemble model.
Previous methods such as depth and stimulative training have further improved the performance of the residual network by sampling and training of itss.
We propose a group knowledge based training framework for boosting the performance of residual networks.
arXiv Detail & Related papers (2023-08-26T05:39:57Z) - ShiftNAS: Improving One-shot NAS via Probability Shift [1.3537414663819973]
We propose ShiftNAS, a method that can adjust the sampling probability based on the complexity of networks.
We evaluate our approach on multiple visual network models, including convolutional neural networks (CNNs) and vision transformers (ViTs)
Experimental results on ImageNet show that ShiftNAS can improve the performance of one-shot NAS without additional consumption.
arXiv Detail & Related papers (2023-07-17T07:53:23Z) - Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts [55.470959564665705]
Weight-sharing supernets are crucial for performance estimation in cutting-edge neural search frameworks.
The proposed method attains state-of-the-art (SoTA) performance in NAS for fast machine translation models.
It excels in NAS for building memory-efficient task-agnostic BERT models.
arXiv Detail & Related papers (2023-06-08T00:35:36Z) - Improve Ranking Correlation of Super-net through Training Scheme from
One-shot NAS to Few-shot NAS [13.390484379343908]
We propose a step-by-step training super-net scheme from one-shot NAS to few-shot NAS.
In the training scheme, we firstly train super-net in a one-shot way, and then we disentangle the weights of super-net.
Our method ranks 4th place in the CVPR2022 3rd Lightweight NAS Challenge Track1.
arXiv Detail & Related papers (2022-06-13T04:02:12Z) - An Analysis of Super-Net Heuristics in Weight-Sharing NAS [70.57382341642418]
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
arXiv Detail & Related papers (2021-10-04T02:18:44Z) - Prioritized Subnet Sampling for Resource-Adaptive Supernet Training [136.6591624918964]
We propose Prioritized Subnet Sampling to train a resource-adaptive supernet, termed PSS-Net.
Experiments on ImageNet using MobileNetV1/V2 show that our PSS-Net can well outperform state-of-the-art resource-adaptive supernets.
arXiv Detail & Related papers (2021-09-12T04:43:51Z) - Understanding and Accelerating Neural Architecture Search with
Training-Free and Theory-Grounded Metrics [117.4281417428145]
This work targets designing a principled and unified training-free framework for Neural Architecture Search (NAS)
NAS has been explosively studied to automate the discovery of top-performer neural networks, but suffers from heavy resource consumption and often incurs search bias due to truncated training or approximations.
We present a unified framework to understand and accelerate NAS, by disentangling "TEG" characteristics of searched networks.
arXiv Detail & Related papers (2021-08-26T17:52:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.