Fugu-MT 論文翻訳(概要): Sharper analysis of sparsely activated wide neural networks with trainable biases

論文の概要: Sharper analysis of sparsely activated wide neural networks with trainable biases

arxiv url: http://arxiv.org/abs/2301.00327v1
Date: Sun, 1 Jan 2023 02:11:39 GMT
ステータス: 翻訳完了
システム内更新日: 2023-01-03 15:28:12.369440
Title: Sharper analysis of sparsely activated wide neural networks with trainable biases
Title（参考訳）: 学習可能なバイアスを有する疎活性化広帯域ニューラルネットワークのシャープ解析
Authors: Hongru Yang, Ziyu Jiang, Ruizhe Zhang, Zhangyang Wang, Yingbin Liang
Abstract要約: 本研究は,ニューラル・タンジェント・カーネル(NTK)の勾配勾配による一層超過パラメータ化ReLUネットワークのトレーニング研究である。驚くべきことに、スパシフィケーション後のネットワークは、元のネットワークと同じくらい高速に収束できることが示されている。一般化境界は制限NTKの最小固有値に依存するため、この研究は制限NTKの最小固有値をさらに研究する。
参考スコア（独自算出の注目度）: 103.85569570164404
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This work studies training one-hidden-layer overparameterized ReLU networks via gradient descent in the neural tangent kernel (NTK) regime, where, differently from the previous works, the networks' biases are trainable and are initialized to some constant rather than zero. The first set of results of this work characterize the convergence of the network's gradient descent dynamics. Surprisingly, it is shown that the network after sparsification can achieve as fast convergence as the original network. The contribution over previous work is that not only the bias is allowed to be updated by gradient descent under our setting but also a finer analysis is given such that the required width to ensure the network's closeness to its NTK is improved. Secondly, the networks' generalization bound after training is provided. A width-sparsity dependence is presented which yields sparsity-dependent localized Rademacher complexity and a generalization bound matching previous analysis (up to logarithmic factors). As a by-product, if the bias initialization is chosen to be zero, the width requirement improves the previous bound for the shallow networks' generalization. Lastly, since the generalization bound has dependence on the smallest eigenvalue of the limiting NTK and the bounds from previous works yield vacuous generalization, this work further studies the least eigenvalue of the limiting NTK. Surprisingly, while it is not shown that trainable biases are necessary, trainable bias helps to identify a nice data-dependent region where a much finer analysis of the NTK's smallest eigenvalue can be conducted, which leads to a much sharper lower bound than the previously known worst-case bound and, consequently, a non-vacuous generalization bound.
Abstract（参考訳）: 本研究は,ニューラルタンジェントカーネル(NTK)の勾配勾配による一層超過パラメータ化ReLUネットワークのトレーニング研究であり,ネットワークのバイアスは以前の研究と異なり,ゼロではなく定数に初期化される。この研究の最初の成果は、ネットワークの勾配降下ダイナミクスの収束を特徴付けるものである。驚くべきことに、スパーシフィケーション後のネットワークは、元のネットワークと同じくらい高速に収束できることが示されている。先行研究に対する貢献は,我々の設定下での勾配降下によってバイアスが更新されるだけでなく,ネットワークのntkとの密接性を確保するために必要な幅が向上するように,より詳細な解析が行われることである。第2に、トレーニング後のネットワークの一般化が提供される。スパルシリティ依存局所ラデマッハ複雑性と(対数因子まで)先行分析に合致する一般化を生じさせる幅スパーシティ依存性が提示される。副産物として、バイアス初期化がゼロであると選択された場合、幅要求は浅層ネットワークの一般化に対する以前の境界を改善する。最後に、一般化境界は極限 NTK の最小固有値と以前の研究の限界値に依存するため、この研究は制限 NTK の最小固有値をさらに研究する。驚くべきことに、トレーニング可能なバイアスが必要とされることは示されていないが、トレーニング可能なバイアスは、NTKの最小固有値のより詳細な分析を行うことができる優れたデータ依存領域を特定するのに役立つ。

論文の概要: Sharper analysis of sparsely activated wide neural networks with trainable biases

関連論文リスト