How to Train Your Super-Net: An Analysis of Training Heuristics in
Weight-Sharing NAS
- URL: http://arxiv.org/abs/2003.04276v2
- Date: Wed, 17 Jun 2020 13:42:15 GMT
- Title: How to Train Your Super-Net: An Analysis of Training Heuristics in
Weight-Sharing NAS
- Authors: Kaicheng Yu and Rene Ranftl and Mathieu Salzmann
- Abstract summary: We show that some commonly-used baselines for super-net training negatively impact the correlation between super-net and stand-alone performance.
Our code and experiments set a strong and reproducible baseline that future works can build on.
- Score: 64.50415611717057
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Weight sharing promises to make neural architecture search (NAS) tractable
even on commodity hardware. Existing methods in this space rely on a diverse
set of heuristics to design and train the shared-weight backbone network,
a.k.a. the super-net. Since heuristics and hyperparameters substantially vary
across different methods, a fair comparison between them can only be achieved
by systematically analyzing the influence of these factors. In this paper, we
therefore provide a systematic evaluation of the heuristics and hyperparameters
that are frequently employed by weight-sharing NAS algorithms. Our analysis
uncovers that some commonly-used heuristics for super-net training negatively
impact the correlation between super-net and stand-alone performance, and
evidences the strong influence of certain hyperparameters and architectural
choices. Our code and experiments set a strong and reproducible baseline that
future works can build on.
Related papers
- DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques.
Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms.
Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z) - Learning to Compose SuperWeights for Neural Parameter Allocation Search [61.078949532440724]
We show that our approach can generate parameters for many network using the same set of weights.
This enables us to support tasks like efficient ensembling and anytime prediction.
arXiv Detail & Related papers (2023-12-03T04:20:02Z) - Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts [55.470959564665705]
Weight-sharing supernets are crucial for performance estimation in cutting-edge neural search frameworks.
The proposed method attains state-of-the-art (SoTA) performance in NAS for fast machine translation models.
It excels in NAS for building memory-efficient task-agnostic BERT models.
arXiv Detail & Related papers (2023-06-08T00:35:36Z) - SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic
Networks [25.465917853812538]
We present an empirical evaluation on methods for sharing parameters in isotropic networks.
We propose a weight sharing strategy to generate a family of models with better overall efficiency.
arXiv Detail & Related papers (2022-07-21T00:16:05Z) - CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot
NAS [19.485514022334844]
One-shot Neural Architecture Search (NAS) has been widely used to discover architectures due to its efficiency.
Previous studies reveal that one-shot performance estimations of architectures might not be well correlated with their performances in stand-alone training.
We propose Curriculum Learning On Sharing Extent (CLOSE) to train the supernet both efficiently and effectively.
arXiv Detail & Related papers (2022-07-16T07:45:17Z) - An Analysis of Super-Net Heuristics in Weight-Sharing NAS [70.57382341642418]
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
arXiv Detail & Related papers (2021-10-04T02:18:44Z) - Weight-Sharing Neural Architecture Search: A Battle to Shrink the
Optimization Gap [90.93522795555724]
Neural architecture search (NAS) has attracted increasing attentions in both academia and industry.
Weight-sharing methods were proposed in which exponentially many architectures share weights in the same super-network.
This paper provides a literature review on NAS, in particular the weight-sharing methods.
arXiv Detail & Related papers (2020-08-04T11:57:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.