Fugu-MT 論文翻訳(概要): Training BatchNorm Only in Neural Architecture Search and Beyond

論文の概要: Training BatchNorm Only in Neural Architecture Search and Beyond

arxiv url: http://arxiv.org/abs/2112.00265v1
Date: Wed, 1 Dec 2021 04:09:09 GMT
ステータス: 翻訳完了
システム内更新日: 2021-12-03 02:08:36.249174
Title: Training BatchNorm Only in Neural Architecture Search and Beyond
Title（参考訳）: BatchNormのトレーニングはニューラルネットワーク検索とそれ以上のもの
Authors: Yichen Zhu, Jie Du, Yuqin Zhu, Yi Wang, Zhicai Ou, Feifei Feng and Jian Tang
Abstract要約: BatchNormのトレーニングが、スーパーネットトレーニング時間を短縮したパフォーマンスウェルアーキテクチャのみを見つけることができる理由を理解するための努力はない。列車BNのみのスーパーネットは、他の演算子よりも畳み込みに有利であり、アーキテクチャ間の不公平な競合を引き起こすことを示す。 3つの視点からネットワークを評価するための新しい複合性能指標を提案する。
参考スコア（独自算出の注目度）: 17.21663067385715
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This work investigates the usage of batch normalization in neural architecture search (NAS). Specifically, Frankle et al. find that training BatchNorm only can achieve nontrivial performance. Furthermore, Chen et al. claim that training BatchNorm only can speed up the training of the one-shot NAS supernet over ten times. Critically, there is no effort to understand 1) why training BatchNorm only can find the perform-well architectures with the reduced supernet-training time, and 2) what is the difference between the train-BN-only supernet and the standard-train supernet. We begin by showing that the train-BN-only networks converge to the neural tangent kernel regime, obtain the same training dynamics as train all parameters theoretically. Our proof supports the claim to train BatchNorm only on supernet with less training time. Then, we empirically disclose that train-BN-only supernet provides an advantage on convolutions over other operators, cause unfair competition between architectures. This is due to only the convolution operator being attached with BatchNorm. Through experiments, we show that such unfairness makes the search algorithm prone to select models with convolutions. To solve this issue, we introduce fairness in the search space by placing a BatchNorm layer on every operator. However, we observe that the performance predictor in Chen et al. is inapplicable on the new search space. To this end, we propose a novel composite performance indicator to evaluate networks from three perspectives: expressivity, trainability, and uncertainty, derived from the theoretical property of BatchNorm. We demonstrate the effectiveness of our approach on multiple NAS-benchmarks (NAS-Bench101, NAS-Bench-201) and search spaces (DARTS search space and MobileNet search space).
Abstract（参考訳）: 本研究では,ニューラルアーキテクチャサーチ(NAS)におけるバッチ正規化の利用について検討する。特にFrankle氏らは、BatchNormのトレーニングは非自明なパフォーマンスしか達成できないと考えている。さらにChenらは、BatchNormのトレーニングは1発のNASスーパーネットのトレーニングを10回以上スピードアップできると主張している。批判的に理解するための努力はありません 1) なぜBatchNormはスーパーネットトレーニング時間を短縮したパフォーマンスウェルアーキテクチャしか見つからないのか。 2) 列車BN専用スーパーネットと標準列車用スーパーネットの違いは何か。まず、トレインBNのみのネットワークがニューラルネットワークカーネル体制に収束し、理論的に全てのパラメータをトレーニングするのと同じトレーニングダイナミクスを得ることを示す。我々の証明は、トレーニング時間の少ないスーパーネットでのみBatchNormをトレーニングするという主張を支持します。そして、列車BNのみのスーパーネットが他の演算子に対する畳み込みに有利であり、アーキテクチャ間の不公平な競合を引き起こすことを実証的に明らかにする。これは、BatchNormにアタッチされている畳み込み演算子のみのためである。実験により,このような不公平さにより,探索アルゴリズムが畳み込みのあるモデルを選択する傾向が示された。この問題を解決するために,各演算子にBatchNorm層を配置することにより,探索空間の公平性を導入する。しかし, chen等における性能予測は, 新たな検索領域では適用不可能である。そこで本研究では,バッチノルムの理論的性質から,表現性,訓練性,不確実性という3つの視点からネットワークを評価する新しい複合性能指標を提案する。本研究では,複数のNASベンチマーク(NAS-Bench101,NAS-Bench-201)と検索空間(DARTS検索空間とMobileNet検索空間)に対するアプローチの有効性を示す。

論文の概要: Training BatchNorm Only in Neural Architecture Search and Beyond

関連論文リスト