CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot
NAS
- URL: http://arxiv.org/abs/2207.07868v1
- Date: Sat, 16 Jul 2022 07:45:17 GMT
- Title: CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot
NAS
- Authors: Zixuan Zhou and Xuefei Ning and Yi Cai and Jiashu Han and Yiping Deng
and Yuhan Dong and Huazhong Yang and Yu Wang
- Abstract summary: One-shot Neural Architecture Search (NAS) has been widely used to discover architectures due to its efficiency.
Previous studies reveal that one-shot performance estimations of architectures might not be well correlated with their performances in stand-alone training.
We propose Curriculum Learning On Sharing Extent (CLOSE) to train the supernet both efficiently and effectively.
- Score: 19.485514022334844
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One-shot Neural Architecture Search (NAS) has been widely used to discover
architectures due to its efficiency. However, previous studies reveal that
one-shot performance estimations of architectures might not be well correlated
with their performances in stand-alone training because of the excessive
sharing of operation parameters (i.e., large sharing extent) between
architectures. Thus, recent methods construct even more over-parameterized
supernets to reduce the sharing extent. But these improved methods introduce a
large number of extra parameters and thus cause an undesirable trade-off
between the training costs and the ranking quality. To alleviate the above
issues, we propose to apply Curriculum Learning On Sharing Extent (CLOSE) to
train the supernet both efficiently and effectively. Specifically, we train the
supernet with a large sharing extent (an easier curriculum) at the beginning
and gradually decrease the sharing extent of the supernet (a harder
curriculum). To support this training strategy, we design a novel supernet
(CLOSENet) that decouples the parameters from operations to realize a flexible
sharing scheme and adjustable sharing extent. Extensive experiments demonstrate
that CLOSE can obtain a better ranking quality across different computational
budget constraints than other one-shot supernets, and is able to discover
superior architectures when combined with various search strategies. Code is
available at https://github.com/walkerning/aw_nas.
Related papers
- The devil is in discretization discrepancy. Robustifying Differentiable NAS with Single-Stage Searching Protocol [2.4300749758571905]
gradient-based methods suffer from the discretization error, which can severely damage the process of obtaining the final architecture.
We introduce a novel single-stage searching protocol, which is not reliant on decoding a continuous architecture.
Our results demonstrate that this approach outperforms other DNAS methods by achieving 75.3% in the searching stage on the Cityscapes validation dataset.
arXiv Detail & Related papers (2024-05-26T15:44:53Z) - Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach [57.175488207316654]
We propose a novel concept of Supernet Shifting, a refined search strategy combining architecture searching with supernet fine-tuning.
We show that Supernet Shifting can fulfill transferring supernet to a new dataset.
Comprehensive experiments show that our method has better order-preserving ability and can find a dominating architecture.
arXiv Detail & Related papers (2024-03-18T00:13:41Z) - Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts [55.470959564665705]
Weight-sharing supernets are crucial for performance estimation in cutting-edge neural search frameworks.
The proposed method attains state-of-the-art (SoTA) performance in NAS for fast machine translation models.
It excels in NAS for building memory-efficient task-agnostic BERT models.
arXiv Detail & Related papers (2023-06-08T00:35:36Z) - Improving Differentiable Architecture Search via Self-Distillation [20.596850268316565]
Differentiable Architecture Search (DARTS) is a simple yet efficient Neural Architecture Search (NAS) method.
We propose Self-Distillation Differentiable Neural Architecture Search (SD-DARTS) to alleviate the discretization gap.
arXiv Detail & Related papers (2023-02-11T08:58:55Z) - Prior-Guided One-shot Neural Architecture Search [11.609732776776982]
We present Prior-Guided One-shot NAS (PGONAS) to strengthen the ranking correlation of supernets.
Our PGONAS ranks 3rd place in the supernet Track Track of CVPR2022 Second lightweight NAS challenge.
arXiv Detail & Related papers (2022-06-27T14:19:56Z) - Generalizing Few-Shot NAS with Gradient Matching [165.5690495295074]
One-Shot methods train one supernet to approximate the performance of every architecture in the search space via weight-sharing.
Few-Shot NAS reduces the level of weight-sharing by splitting the One-Shot supernet into multiple separated sub-supernets.
It significantly outperforms its Few-Shot counterparts while surpassing previous comparable methods in terms of the accuracy of derived architectures.
arXiv Detail & Related papers (2022-03-29T03:06:16Z) - An Analysis of Super-Net Heuristics in Weight-Sharing NAS [70.57382341642418]
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
arXiv Detail & Related papers (2021-10-04T02:18:44Z) - Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural
Architecture Search [60.965024145243596]
One-shot weight sharing methods have recently drawn great attention in neural architecture search due to high efficiency and competitive performance.
To alleviate this problem, we present a simple yet effective architecture distillation method.
We introduce the concept of prioritized path, which refers to the architecture candidates exhibiting superior performance during training.
Since the prioritized paths are changed on the fly depending on their performance and complexity, the final obtained paths are the cream of the crop.
arXiv Detail & Related papers (2020-10-29T17:55:05Z) - How to Train Your Super-Net: An Analysis of Training Heuristics in
Weight-Sharing NAS [64.50415611717057]
We show that some commonly-used baselines for super-net training negatively impact the correlation between super-net and stand-alone performance.
Our code and experiments set a strong and reproducible baseline that future works can build on.
arXiv Detail & Related papers (2020-03-09T17:34:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.