Training BatchNorm Only in Neural Architecture Search and Beyond
- URL: http://arxiv.org/abs/2112.00265v1
- Date: Wed, 1 Dec 2021 04:09:09 GMT
- Title: Training BatchNorm Only in Neural Architecture Search and Beyond
- Authors: Yichen Zhu, Jie Du, Yuqin Zhu, Yi Wang, Zhicai Ou, Feifei Feng and
Jian Tang
- Abstract summary: There is no effort to understand why training BatchNorm only can find the perform-well architectures with the reduced supernet-training time.
We show that train-BN-only supernet provides an advantage on convolutions over other operators, cause unfair competition between architectures.
We propose a novel composite performance indicator to evaluate networks from three perspectives.
- Score: 17.21663067385715
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This work investigates the usage of batch normalization in neural
architecture search (NAS). Specifically, Frankle et al. find that training
BatchNorm only can achieve nontrivial performance. Furthermore, Chen et al.
claim that training BatchNorm only can speed up the training of the one-shot
NAS supernet over ten times. Critically, there is no effort to understand 1)
why training BatchNorm only can find the perform-well architectures with the
reduced supernet-training time, and 2) what is the difference between the
train-BN-only supernet and the standard-train supernet. We begin by showing
that the train-BN-only networks converge to the neural tangent kernel regime,
obtain the same training dynamics as train all parameters theoretically. Our
proof supports the claim to train BatchNorm only on supernet with less training
time. Then, we empirically disclose that train-BN-only supernet provides an
advantage on convolutions over other operators, cause unfair competition
between architectures. This is due to only the convolution operator being
attached with BatchNorm. Through experiments, we show that such unfairness
makes the search algorithm prone to select models with convolutions. To solve
this issue, we introduce fairness in the search space by placing a BatchNorm
layer on every operator. However, we observe that the performance predictor in
Chen et al. is inapplicable on the new search space. To this end, we propose a
novel composite performance indicator to evaluate networks from three
perspectives: expressivity, trainability, and uncertainty, derived from the
theoretical property of BatchNorm. We demonstrate the effectiveness of our
approach on multiple NAS-benchmarks (NAS-Bench101, NAS-Bench-201) and search
spaces (DARTS search space and MobileNet search space).
Related papers
- Neural Architecture Search via Two Constant Shared Weights Initialisations [0.0]
We present a zero-cost metric that highly correlated with the train set accuracy across the NAS-Bench-101, NAS-Bench-201 and NAS-Bench-NLP benchmark datasets.
Our method is easy to integrate within existing NAS algorithms and takes a fraction of a second to evaluate a single network.
arXiv Detail & Related papers (2023-02-09T02:25:38Z) - An Analysis of Super-Net Heuristics in Weight-Sharing NAS [70.57382341642418]
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
We show that simple random search achieves competitive performance to complex state-of-the-art NAS algorithms when the super-net is properly trained.
arXiv Detail & Related papers (2021-10-04T02:18:44Z) - Pi-NAS: Improving Neural Architecture Search by Reducing Supernet
Training Consistency Shift [128.32670289503025]
Recently proposed neural architecture search (NAS) methods co-train billions of architectures in a supernet and estimate their potential accuracy.
The ranking correlation between the architectures' predicted accuracy and their actual capability is incorrect, which causes the existing NAS methods' dilemma.
We attribute this ranking correlation problem to the supernet training consistency shift, including feature shift and parameter shift.
We address these two shifts simultaneously using a nontrivial supernet-Pi model, called Pi-NAS.
arXiv Detail & Related papers (2021-08-22T09:08:48Z) - BN-NAS: Neural Architecture Search with Batch Normalization [116.47802796784386]
We present BN-NAS, neural architecture search with Batch Normalization (BN-NAS), to accelerate neural architecture search (NAS)
BN-NAS can significantly reduce the time required by model training and evaluation in NAS.
arXiv Detail & Related papers (2021-08-16T23:23:21Z) - BossNAS: Exploring Hybrid CNN-transformers with Block-wisely
Self-supervised Neural Architecture Search [100.28980854978768]
We present Block-wisely Self-supervised Neural Architecture Search (BossNAS)
We factorize the search space into blocks and utilize a novel self-supervised training scheme, named ensemble bootstrapping, to train each block separately.
We also present HyTra search space, a fabric-like hybrid CNN-transformer search space with searchable down-sampling positions.
arXiv Detail & Related papers (2021-03-23T10:05:58Z) - Neural Architecture Search on ImageNet in Four GPU Hours: A
Theoretically Inspired Perspective [88.39981851247727]
We propose a novel framework called training-free neural architecture search (TE-NAS)
TE-NAS ranks architectures by analyzing the spectrum of the neural tangent kernel (NTK) and the number of linear regions in the input space.
We show that: (1) these two measurements imply the trainability and expressivity of a neural network; (2) they strongly correlate with the network's test accuracy.
arXiv Detail & Related papers (2021-02-23T07:50:44Z) - Neural Architecture Search without Training [8.067283219068832]
In this work, we examine the overlap of activations between datapoints in untrained networks.
We motivate how this can give a measure which is usefully indicative of a network's trained performance.
We incorporate this measure into a simple algorithm that allows us to search for powerful networks without any training in a matter of seconds on a single GPU.
arXiv Detail & Related papers (2020-06-08T14:53:56Z) - GreedyNAS: Towards Fast One-Shot NAS with Greedy Supernet [63.96959854429752]
GreedyNAS is easy-to-follow, and experimental results on ImageNet dataset indicate that it can achieve better Top-1 accuracy under same search space and FLOPs or latency level.
By searching on a larger space, our GreedyNAS can also obtain new state-of-the-art architectures.
arXiv Detail & Related papers (2020-03-25T06:54:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.