Fugu-MT 論文翻訳(概要): Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima

論文の概要: Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima

arxiv url: http://arxiv.org/abs/2606.20469v1
Date: Thu, 18 Jun 2026 16:48:18 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-19 18:23:39.996573
Title: Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima
Title（参考訳）: フィジカル・ジオメトリカル・シャープネスとSGDのフラット・ミニマへのインシシットバイアス
Authors: Md Sakir Ahmed, Kumaresh Sarmah, Hemen Dutta,
Abstract要約: 勾配降下(SGD)は、平らなミニマを暗黙的に好んでいる。滑らかな関数保存型再パラメータ化の下では不変であることが証明される。 MNIST と CIFAR-10 の実験により、SR はユークリッドのシャープネスがそうでない方法で一般化を確実に追跡することを確認した。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A widely held intuition in deep learning is that stochastic gradient descent (SGD) implicitly favors flat minima and that flat minima generalize better, but standard Euclidean measures of flatness such as the trace or maximum eigenvalue of the loss Hessian are not invariant under reparametrizations that preserve the network function, which undermines the theoretical foundations of this narrative. In this study we resolve this issue by grounding flatness in the Riemannian geometry of the statistical manifold induced by the Fisher Information Matrix (FIM). We define Riemannian sharpness mathematically and prove that it is invariant under smooth, function-preserving reparametrizations, which directly addresses the critique of Dinh et al. in the paper ``Sharp minima can generalize for deep nets''.We note that this invariance is a property of the true FIM; the diagonal empirical estimator used in practice (and in all experiments below) inherits invariance only approximately, and exact invariance under arbitrary reparametrizations would require structured estimators such as K-FAC. We formalize the gradient noise of mini-batch SGD as having a covariance structure proportional to the FIM, derive the stationary distribution of the resulting stochastic differential equation, and then show that the probability mass is exponentially concentrated at Riemannian-flat minima. A PAC-Bayes generalization bound controlled explicitly by SR formally links this geometric bias to test performance. Our experiments on MNIST and CIFAR-10 confirm that SR reliably tracks generalization in ways that Euclidean sharpness does not, and that its scaling with $η/B$ matches the theoretical predictions. Together these results provide a rigorous, reparametrization-invariant account of why flat minima generalize.
Abstract（参考訳）: 深層学習における一般的な直観は、確率勾配降下(SGD)は平らなミニマを暗黙に好んでおり、平坦なミニマはより良く一般化するが、損失ヘッセンのトレースや最大固有値のような平坦性の標準ユークリッド測度は、ネットワーク関数を保存する再パラメータ化の下で不変ではなく、この物語の理論的基礎を損なう。本研究では,フィッシャー情報行列 (FIM) によって誘導される統計多様体のリーマン幾何学において平坦性を基底としてこの問題を解決する。我々はリーマン的シャープネスを数学的に定義し、それが滑らかで関数保存的なパラメータ化の下で不変であることを証明し、論文 '`Sharp minima can generalize for Deep nets' の中でDinh et al の批判を直接扱う。この不変性は真のFIMの特性であり、実際に使われる対角的経験的推定器(および以下のすべての実験で)は、ほぼ不変性を継承するだけであり、任意の再パラメータ化の下での正確な不変性は、K-FACのような構造化された推定器を必要とする。我々は、FIMに比例する共分散構造を持つミニバッチSGDの勾配雑音を定式化し、その結果の確率微分方程式の定常分布を導出し、確率質量がリーマン平坦なミニマに指数関数的に集中していることを示す。 SRにより明示的に制御されるPAC-ベイズ一般化は、この幾何学バイアスをテスト性能に正式にリンクする。 MNIST と CIFAR-10 に関する実験により、SR はユークリッドのシャープネスが期待できない方法で一般化を確実に追跡し、η/B$ でのスケーリングが理論的予測と一致することを確認した。これらの結果は、平坦なミニマが一般化する理由について、厳密で再パラメトリゼーション不変な説明を与える。

論文の概要: Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima

関連論文リスト