Fugu-MT 論文翻訳(概要): Renormalizable Spectral-Shell Dynamics as the Origin of Neural Scaling Laws

論文の概要: Renormalizable Spectral-Shell Dynamics as the Origin of Neural Scaling Laws

arxiv url: http://arxiv.org/abs/2512.10427v2
Date: Mon, 15 Dec 2025 06:45:10 GMT
ステータス: 翻訳完了
システム内更新日: 2025-12-16 15:10:29.232731
Title: Renormalizable Spectral-Shell Dynamics as the Origin of Neural Scaling Laws
Title（参考訳）: ニューラルスケーリング法則の起源としての正規化可能なスペクトルシェルダイナミクス
Authors: Yizhou Zhang,
Abstract要約: 高い非線形最適化ダイナミクスにもかかわらず、ディープ・ネットワーク・トレーニングは単純なマクロ構造に従うことを示す。平均二乗誤差損失の場合、トレーニングエラーは$dot e_t=-M(t)e_t$と$M(t)=J_(t)J_(t)!*$として進化する。このフレームワークは、ニューラルスケーリング法則と二重降下を説明し、遅延(NTKライクな)トレーニングと特徴学習を同一スペクトルシェルの2つの限界として統一する。
参考スコア（独自算出の注目度）: 2.779943773196378
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Neural scaling laws and double-descent phenomena suggest that deep-network training obeys a simple macroscopic structure despite highly nonlinear optimization dynamics. We derive such structure directly from gradient descent in function space. For mean-squared error loss, the training error evolves as $\dot e_t=-M(t)e_t$ with $M(t)=J_{θ(t)}J_{θ(t)}^{\!*}$, a time-dependent self-adjoint operator induced by the network Jacobian. Using Kato perturbation theory, we obtain an exact system of coupled modewise ODEs in the instantaneous eigenbasis of $M(t)$. To extract macroscopic behavior, we introduce a logarithmic spectral-shell coarse-graining and track quadratic error energy across shells. Microscopic interactions within each shell cancel identically at the energy level, so shell energies evolve only through dissipation and external inter-shell interactions. We formalize this via a \emph{renormalizable shell-dynamics} assumption, under which cumulative microscopic effects reduce to a controlled net flux across shell boundaries. Assuming an effective power-law spectral transport in a relevant resolution range, the shell dynamics admits a self-similar solution with a moving resolution frontier and explicit scaling exponents. This framework explains neural scaling laws and double descent, and unifies lazy (NTK-like) training and feature learning as two limits of the same spectral-shell dynamics.
Abstract（参考訳）: ニューラルスケーリング法則と二重発振現象は、非常に非線形な最適化力学にもかかわらず、ディープネットワークトレーニングが単純なマクロ構造に従うことを示唆している。関数空間の勾配降下から直接そのような構造を導出する。平均二乗誤差損失の場合、トレーニングエラーは$\dot e_t=-M(t)e_t$と$M(t)=J_{θ(t)}J_{θ(t)}^{\! *}$ はネットワークヤコビアンによって誘導される時間依存の自己随伴作用素である。加藤摂動理論を用いて、M(t)$の即時固有ベイズにおいて、結合モードワイドODEの正確な系を得る。マクロな振る舞いを抽出するために,対数スペクトル殻粗粒化法を導入し,殻の2次誤差エネルギーを追跡する。各シェル内の顕微鏡的相互作用はエネルギーレベルで同一にキャンセルされるため、シェルエネルギーは散逸と外殻間相互作用によってのみ進化する。我々はこれを、累積的な顕微鏡効果が殻の境界を越えて制御されたネットフラックスに還元される「emph{renormalizable shell-dynamics}」仮定で定式化する。有効なパワーロースペクトル輸送を関連する解像度範囲で仮定すると、シェルダイナミクスは、移動解像度フロンティアと明示的なスケーリング指数を持つ自己相似解を許容する。このフレームワークは、ニューラルスケーリング法則と二重降下を説明し、遅延(NTKライクな)トレーニングと特徴学習を、同じスペクトルシェルダイナミクスの2つの限界として統一する。

論文の概要: Renormalizable Spectral-Shell Dynamics as the Origin of Neural Scaling Laws

関連論文リスト