Related papers: How to Train Your Super-Net: An Analysis of Training Heuristics in Weight-Sharing NAS

How to Train Your Super-Net: An Analysis of Training Heuristics in Weight-Sharing NAS

URL: http://arxiv.org/abs/2003.04276v2
Date: Wed, 17 Jun 2020 13:42:15 GMT
Title: How to Train Your Super-Net: An Analysis of Training Heuristics in Weight-Sharing NAS
Authors: Kaicheng Yu and Rene Ranftl and Mathieu Salzmann
Abstract summary: We show that some commonly-used baselines for super-net training negatively impact the correlation between super-net and stand-alone performance. Our code and experiments set a strong and reproducible baseline that future works can build on.
Score: 64.50415611717057
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Weight sharing promises to make neural architecture search (NAS) tractable even on commodity hardware. Existing methods in this space rely on a diverse set of heuristics to design and train the shared-weight backbone network, a.k.a. the super-net. Since heuristics and hyperparameters substantially vary across different methods, a fair comparison between them can only be achieved by systematically analyzing the influence of these factors. In this paper, we therefore provide a systematic evaluation of the heuristics and hyperparameters that are frequently employed by weight-sharing NAS algorithms. Our analysis uncovers that some commonly-used heuristics for super-net training negatively impact the correlation between super-net and stand-alone performance, and evidences the strong influence of certain hyperparameters and architectural choices. Our code and experiments set a strong and reproducible baseline that future works can build on.

Related papers

SWAT-NN: Simultaneous Weights and Architecture Training for Neural Networks in a Latent Space [6.2241272327831485]
We propose a framework that simultaneously optimize both the architecture and the weights of a neural network.<n>Our framework first trains a universal multi-scale autoencoder that embeds both architectural and parametric information into a continuous latent space.<n>Given a dataset, we then randomly initialize a point in the embedding space and update it via gradient descent to obtain the optimal neural network.
arXiv Detail & Related papers (2025-06-09T22:22:37Z)
DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions [121.05720140641189]
We develop a family of models with the distilling neural architecture (DNA) techniques. Our proposed DNA models can rate all architecture candidates, as opposed to previous works that can only access a sub- search space using algorithms. Our models achieve state-of-the-art top-1 accuracy of 78.9% and 83.6% on ImageNet for a mobile convolutional network and a small vision transformer, respectively.
arXiv Detail & Related papers (2024-03-02T22:16:47Z)
Learning to Compose SuperWeights for Neural Parameter Allocation Search [61.078949532440724]
We show that our approach can generate parameters for many network using the same set of weights. This enables us to support tasks like efficient ensembling and anytime prediction.
arXiv Detail & Related papers (2023-12-03T04:20:02Z)
Mixture-of-Supernets: Improving Weight-Sharing Supernet Training with Architecture-Routed Mixture-of-Experts [55.470959564665705]
Weight-sharing supernets are crucial for performance estimation in cutting-edge neural search frameworks. The proposed method attains state-of-the-art (SoTA) performance in NAS for fast machine translation models. It excels in NAS for building memory-efficient task-agnostic BERT models.
arXiv Detail & Related papers (2023-06-08T00:35:36Z)
SPIN: An Empirical Evaluation on Sharing Parameters of Isotropic Networks [25.465917853812538]
We present an empirical evaluation on methods for sharing parameters in isotropic networks. We propose a weight sharing strategy to generate a family of models with better overall efficiency.
arXiv Detail & Related papers (2022-07-21T00:16:05Z)
CLOSE: Curriculum Learning On the Sharing Extent Towards Better One-shot NAS [19.485514022334844]
One-shot Neural Architecture Search (NAS) has been widely used to discover architectures due to its efficiency. Previous studies reveal that one-shot performance estimations of architectures might not be well correlated with their performances in stand-alone training. We propose Curriculum Learning On Sharing Extent (CLOSE) to train the supernet both efficiently and effectively.
arXiv Detail & Related papers (2022-07-16T07:45:17Z)
An Analysis of Super-Net Heuristics in Weight-Sharing NAS [70.57382341642418]
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained. We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
arXiv Detail & Related papers (2021-10-04T02:18:44Z)
Weight-Sharing Neural Architecture Search: A Battle to Shrink the Optimization Gap [90.93522795555724]
Neural architecture search (NAS) has attracted increasing attentions in both academia and industry. Weight-sharing methods were proposed in which exponentially many architectures share weights in the same super-network. This paper provides a literature review on NAS, in particular the weight-sharing methods.
arXiv Detail & Related papers (2020-08-04T11:57:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.