Fugu-MT 論文翻訳(概要): Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

論文の概要: Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

arxiv url: http://arxiv.org/abs/2605.08352v1
Date: Fri, 08 May 2026 18:02:37 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:49.585132
Title: Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit
Title（参考訳）: 過パラメータ化限界におけるニュートン法によるニューラルネットワークの収束解析
Authors: Konstantin Riedl, Konstantinos Spiliopoulos, Justin Sirignano,
Abstract要約: ニューラルネットワークのトレーニングダイナミクスは,対象データに対して指数関数的に高速に収束することを示す。収束は周波数スペクトルにわたって均一であり、勾配降下に固有のスペクトルバイアスに対処する。正規化パラメータを選択するためのスケーリング公式を同定し、隠れたユニットの数が大きくなるにつれて適切な速度で消えることを示す。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A convergence analysis is developed for the regularized Newton method for training neural networks (NNs) in the overparameterized limit. As the number of hidden units tends to infinity, the NN training dynamics converge in probability to the solution of a deterministic limit equation involving a ``Newton neural tangent kernel'' (NNTK). Explicit rates characterizing this convergence are provided and, in the infinite-width limit, we prove that the NN converges exponentially fast to the target data (i.e., a global minimizer with zero loss). We show that this convergence is uniform across the frequency spectrum, addressing the spectral bias inherent in gradient descent. The eigenvalues of the NTK for gradient descent accumulate at zero, leading to slow convergence for target data with high-frequency components. In contrast, the NNTK has uniformly lower bounded eigenvalues if the regularization parameter is selected appropriately, allowing Newton's method to converge more quickly for data with high-frequency components. Mathematical challenges that need to be addressed in our analysis include the implicit parameter update of the Newton method with a potentially indefinite Hessian matrix and the fact that the dimension of this linear system of equations tends to infinity as the NN width grows. This complicates deriving the training dynamics in the overparameterized limit as well as proving the convergence of the finite-width dynamics thereto. The analysis identifies a scaling formula for selecting the regularization parameter, which we show can vanish at a suitable rate as the number of hidden units becomes larger. We prove that, for sufficiently large numbers of hidden units, the regularized Hessian remains positive definite during training and the Newton updates for individual NN parameters converge to zero, showing that the model behaves as a linearization around the initialization.
Abstract（参考訳）: 過パラメータ化限界におけるニューラルネットワーク(NN)のトレーニングのための正規化ニュートン法に対して収束解析法を開発した。隠れた単位の数が無限大になる傾向があるため、NNトレーニングダイナミクスは'Newton Neural Tangent kernel'' (NNTK) を含む決定論的極限方程式の解に確率的に収束する。この収束を特徴付ける明示的な速度が提供され、無限幅の極限では、NNがターゲットデータ(すなわち、損失ゼロの大域最小化器)に指数関数的に収束することが証明される。この収束は周波数スペクトルにわたって均一であり、勾配降下に固有のスペクトルバイアスに対処する。勾配降下に対するNTKの固有値は0で蓄積され、高周波成分による目標データの収束が遅くなる。対照的に、NNTKは正規化パラメータが適切に選択された場合、一様に低い有界固有値を持つため、ニュートン法は高周波成分を持つデータに対してより高速に収束することができる。解析で解決すべき数学的課題は、潜在的に不確定なヘッセン行列を持つニュートン法の暗黙のパラメータ更新と、この方程式の線形系の次元が、NN幅が大きくなるにつれて無限大になるという事実である。これにより、過パラメータ化極限におけるトレーニング力学の導出が複雑になり、有限幅の力学の収束が証明される。解析では正規化パラメータを選択するためのスケーリング式を特定し,隠れたユニットの数が増えるにつれて適切な速度で消滅することを示した。十分な数の隠れ単位に対して、正規化ヘッセンはトレーニング中に正定値のままであり、個々のNNパラメータに対するニュートン更新はゼロに収束し、初期化の周りの線形化として振る舞うことを示す。

論文の概要: Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

関連論文リスト