Fugu-MT 論文翻訳(概要): Error whitening: Why Gauss-Newton outperforms Newton

論文の概要: Error whitening: Why Gauss-Newton outperforms Newton

arxiv url: http://arxiv.org/abs/2605.11316v1
Date: Mon, 11 May 2026 23:07:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:56.470365
Title: Error whitening: Why Gauss-Newton outperforms Newton
Title（参考訳）: エラー・ホワイトニング:なぜガウス・ニュートンがニュートンを上回ったのか
Authors: Maricela Best McKay, Nathan P. Lawrence, Brian Wetton, R. Bhushan Gopaluni,
Abstract要約: ガウス・ニュートンが関数空間のニュートン方向をモデル空間に投影していることが示される。ガウスニュートンは理論的に予測された函数空間のダイナミクスに従ってニュートン法より優れていることを実証的に証明する。
参考スコア（独自算出の注目度）: 0.9216325369400603
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The Gauss-Newton matrix is widely viewed as a positive semidefinite approximation of the Hessian, yet mounting empirical evidence shows that Gauss-Newton descent outperforms Newton's method. We adopt a function space perspective to analyze this phenomenon. We show that the generalized Gauss-Newton (GGN) matrix projects the Newton direction in function space onto the model's tangent space, while a Jacobian-only variant obtained by applying the least squares Gauss-Newton matrix to non-least squares losses projects the function space loss gradient onto this same tangent space. Both projections eliminate distortions from the model's parameterization. Specifically, the evolution of the prediction-target mismatch depends on the model's parameterization through the matrix $JJ^\top$ where $J$ is the Jacobian of the model with respect to its parameters. The projections effectively replace $JJ^\top$ with the identity. We call this effect error whitening. Once the parameterization is removed, the prediction-target mismatch evolves according to dynamics dictated by the structure of the loss and the projection produced by the optimizer. Error whitening is a special property of Gauss-Newton descent that rigorously distinguishes it from Newton's method. We empirically demonstrate that Gauss-Newton optimizers follow the theoretically predicted function space dynamics and outperforms Newton's method, Adam, and Muon across case studies spanning supervised learning, physics-informed deep learning, and approximate dynamic programming.
Abstract（参考訳）: ガウス・ニュートン行列は、ヘッセンの正の半定値近似として広く見なされているが、実証的な証拠は、ガウス・ニュートン降下がニュートンの方法より優れていることを示している。この現象を解析するために関数空間の観点を採用する。一般化されたガウス・ニュートン行列(GGN)は、関数空間のニュートン方向をモデルの接空間に投影するのに対し、最小二乗ガウス・ニュートン行列を非最小二乗に応用したヤコビアンのみの多様体は、関数空間損失勾配を同じ接空間に投影することを示す。どちらの射影もモデルのパラメータ化から歪みを取り除く。具体的には、予測対象ミスマッチの進化は、行列 $JJ^\top$ を通じてモデルのパラメータ化に依存する。プロジェクションは、事実上$JJ^\top$をIDに置き換える。私たちはこの効果をホワイトニングと呼ぶ。パラメータ化が除去されると、損失の構造とオプティマイザが生成するプロジェクションによって予測されるダイナミクスに従って予測目標ミスマッチが進化する。誤りの白化はガウス・ニュートンの子孫の特別な性質であり、ニュートンの方法と厳密に区別している。ガウス=ニュートン最適化器は理論的に予測された関数空間のダイナミクスに従い、教師付き学習、物理インフォームドディープラーニング、近似動的プログラミングにまたがるケーススタディにおいてニュートン法、アダム法、ムーン法より優れていることを実証的に実証した。

論文の概要: Error whitening: Why Gauss-Newton outperforms Newton

関連論文リスト