Fugu-MT 論文翻訳(概要): Batch Group Normalization

論文の概要: Batch Group Normalization

arxiv url: http://arxiv.org/abs/2012.02782v2
Date: Wed, 9 Dec 2020 01:26:51 GMT
ステータス: 翻訳完了
システム内更新日: 2021-05-23 00:23:04.021169
Title: Batch Group Normalization
Title（参考訳）: バッチ群正規化
Authors: Xiao-Yun Zhou, Jiacheng Sun, Nanyang Ye, Xu Lan, Qijun Luo, Bo-Lin Lai, Pedro Esperanca, Guang-Zhong Yang, Zhenguo Li
Abstract要約: バッチ正規化(BN)は中規模および大規模なバッチサイズでよく機能する。 BNは、ノイズ/畳み込み統計計算により、小さな/非常に大きなバッチサイズで飽和する。 BGN は小・極大バッチサイズでの BN の雑音/畳み込み統計計算を解くために提案される。
参考スコア（独自算出の注目度）: 45.03388237812212
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Convolutional Neural Networks (DCNNs) are hard and time-consuming to train. Normalization is one of the effective solutions. Among previous normalization methods, Batch Normalization (BN) performs well at medium and large batch sizes and is with good generalizability to multiple vision tasks, while its performance degrades significantly at small batch sizes. In this paper, we find that BN saturates at extreme large batch sizes, i.e., 128 images per worker, i.e., GPU, as well and propose that the degradation/saturation of BN at small/extreme large batch sizes is caused by noisy/confused statistic calculation. Hence without adding new trainable parameters, using multiple-layer or multi-iteration information, or introducing extra computation, Batch Group Normalization (BGN) is proposed to solve the noisy/confused statistic calculation of BN at small/extreme large batch sizes with introducing the channel, height and width dimension to compensate. The group technique in Group Normalization (GN) is used and a hyper-parameter G is used to control the number of feature instances used for statistic calculation, hence to offer neither noisy nor confused statistic for different batch sizes. We empirically demonstrate that BGN consistently outperforms BN, Instance Normalization (IN), Layer Normalization (LN), GN, and Positional Normalization (PN), across a wide spectrum of vision tasks, including image classification, Neural Architecture Search (NAS), adversarial learning, Few Shot Learning (FSL) and Unsupervised Domain Adaptation (UDA), indicating its good performance, robust stability to batch size and wide generalizability. For example, for training ResNet-50 on ImageNet with a batch size of 2, BN achieves Top1 accuracy of 66.512% while BGN achieves 76.096% with notable improvement.
Abstract（参考訳）: 深層畳み込みニューラルネットワーク(DCNN)は、トレーニングに難しく、時間を要する。正規化は有効な解の1つである。従来の正規化手法では、バッチ正規化(bn)は中規模および大規模バッチサイズで良好に動作し、複数のビジョンタスクの汎用性も高いが、小さなバッチサイズでは性能が著しく低下する。本稿では、BNがワーカ当たり128の画像、すなわちGPUで非常に大きなバッチサイズで飽和していることと、BNの小型/極大バッチサイズでの劣化/飽和がノイズ/畳み込み統計計算によって引き起こされることを提案する。したがって、新しいトレーニング可能なパラメータを追加せずに、複数層または多層情報を使用したり、余分な計算を導入したりすることなく、Batch Group Normalization (BGN) が提案され、チャネル、高さ、幅を補うことで、小/極大バッチサイズでのBNのノイズ/畳み込み統計計算を解くことができる。グループ正規化(GN)におけるグループテクニックを使用し、統計計算に使用される特徴量の制御にハイパーパラメータGを用いるため、異なるバッチサイズに対してノイズや混乱した統計量を提供しない。我々は,BGNがBN,インスタンス正規化(IN),レイヤ正規化(LN),GN,位置正規化(PN),画像分類,ニューラルアーキテクチャサーチ(NAS),対角学習,FSL(Few Shot Learning),非教師なしドメイン適応(Unsupervised Domain Adaptation,UDA)など,幅広い視覚タスクにおいて一貫して優れており,その優れた性能,バッチサイズに対する安定性,広範な一般化性を示す。例えば、ImageNet上のResNet-50をバッチサイズ2でトレーニングする場合、BNは66.512%、BGNは76.096%の精度で改善した。

論文の概要: Batch Group Normalization

関連論文リスト