How Powerful are Performance Predictors in Neural Architecture Search?
- URL: http://arxiv.org/abs/2104.01177v1
- Date: Fri, 2 Apr 2021 17:57:16 GMT
- Title: How Powerful are Performance Predictors in Neural Architecture Search?
- Authors: Colin White, Arber Zela, Binxin Ru, Yang Liu, Frank Hutter
- Abstract summary: We give the first large-scale study of performance predictors by analyzing 31 techniques.
We show that certain families of predictors can be combined to achieve even better predictive power.
- Score: 43.86743225322636
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Early methods in the rapidly developing field of neural architecture search
(NAS) required fully training thousands of neural networks. To reduce this
extreme computational cost, dozens of techniques have since been proposed to
predict the final performance of neural architectures. Despite the success of
such performance prediction methods, it is not well-understood how different
families of techniques compare to one another, due to the lack of an
agreed-upon evaluation metric and optimization for different constraints on the
initialization time and query time. In this work, we give the first large-scale
study of performance predictors by analyzing 31 techniques ranging from
learning curve extrapolation, to weight-sharing, to supervised learning, to
"zero-cost" proxies. We test a number of correlation- and rank-based
performance measures in a variety of settings, as well as the ability of each
technique to speed up predictor-based NAS frameworks. Our results act as
recommendations for the best predictors to use in different settings, and we
show that certain families of predictors can be combined to achieve even better
predictive power, opening up promising research directions. Our code, featuring
a library of 31 performance predictors, is available at
https://github.com/automl/naslib.
Related papers
- FR-NAS: Forward-and-Reverse Graph Predictor for Efficient Neural Architecture Search [10.699485270006601]
We introduce a novel Graph Neural Networks (GNN) predictor for Neural Architecture Search (NAS)
This predictor renders neural architectures into vector representations by combining both the conventional and inverse graph views.
The experimental results showcase a significant improvement in prediction accuracy, with a 3%--16% increase in Kendall-tau correlation.
arXiv Detail & Related papers (2024-04-24T03:22:49Z) - AIO-P: Expanding Neural Performance Predictors Beyond Image
Classification [22.743278613519152]
We propose a novel All-in-One Predictor (AIO-P) to pretrain neural predictors on architecture examples.
AIO-P can achieve Mean Absolute Error (MAE) and Spearman's Rank Correlation (SRCC) below 1% and above 0.5, respectively.
arXiv Detail & Related papers (2022-11-30T18:30:41Z) - Confidence-Nets: A Step Towards better Prediction Intervals for
regression Neural Networks on small datasets [0.0]
We propose an ensemble method that attempts to estimate the uncertainty of predictions, increase their accuracy and provide an interval for the expected variation.
The proposed method is tested on various datasets, and a significant improvement in the performance of the neural network model is seen.
arXiv Detail & Related papers (2022-10-31T06:38:40Z) - Towards Theoretically Inspired Neural Initialization Optimization [66.04735385415427]
We propose a differentiable quantity, named GradCosine, with theoretical insights to evaluate the initial state of a neural network.
We show that both the training and test performance of a network can be improved by maximizing GradCosine under norm constraint.
Generalized from the sample-wise analysis into the real batch setting, NIO is able to automatically look for a better initialization with negligible cost.
arXiv Detail & Related papers (2022-10-12T06:49:16Z) - Learning Predictions for Algorithms with Predictions [49.341241064279714]
We introduce a general design approach for algorithms that learn predictors.
We apply techniques from online learning to learn against adversarial instances, tune robustness-consistency trade-offs, and obtain new statistical guarantees.
We demonstrate the effectiveness of our approach at deriving learning algorithms by analyzing methods for bipartite matching, page migration, ski-rental, and job scheduling.
arXiv Detail & Related papers (2022-02-18T17:25:43Z) - RANK-NOSH: Efficient Predictor-Based Architecture Search via Non-Uniform
Successive Halving [74.61723678821049]
We propose NOn-uniform Successive Halving (NOSH), a hierarchical scheduling algorithm that terminates the training of underperforming architectures early to avoid wasting budget.
We formulate predictor-based architecture search as learning to rank with pairwise comparisons.
The resulting method - RANK-NOSH, reduces the search budget by 5x while achieving competitive or even better performance than previous state-of-the-art predictor-based methods on various spaces and datasets.
arXiv Detail & Related papers (2021-08-18T07:45:21Z) - Self-supervised Representation Learning for Evolutionary Neural
Architecture Search [9.038625856798227]
Recently proposed neural architecture search (NAS) algorithms adopt neural predictors to accelerate the architecture search.
How to obtain a neural predictor with high prediction accuracy using a small amount of training data is a central problem to neural predictor-based NAS.
We devise two self-supervised learning methods to pre-train the architecture embedding part of neural predictors.
We achieve state-of-the-art performance on the NASBench-101 and NASBench201 benchmarks when integrating the pre-trained neural predictors with an evolutionary NAS algorithm.
arXiv Detail & Related papers (2020-10-31T04:57:16Z) - FBNetV3: Joint Architecture-Recipe Search using Predictor Pretraining [65.39532971991778]
We present an accuracy predictor that scores architecture and training recipes jointly, guiding both sample selection and ranking.
We run fast evolutionary searches in just CPU minutes to generate architecture-recipe pairs for a variety of resource constraints.
FBNetV3 makes up a family of state-of-the-art compact neural networks that outperform both automatically and manually-designed competitors.
arXiv Detail & Related papers (2020-06-03T05:20:21Z) - Large Batch Training Does Not Need Warmup [111.07680619360528]
Training deep neural networks using a large batch size has shown promising results and benefits many real-world applications.
In this paper, we propose a novel Complete Layer-wise Adaptive Rate Scaling (CLARS) algorithm for large-batch training.
Based on our analysis, we bridge the gap and illustrate the theoretical insights for three popular large-batch training techniques.
arXiv Detail & Related papers (2020-02-04T23:03:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.