Fugu-MT 論文翻訳(概要): Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit

論文の概要: Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit

arxiv url: http://arxiv.org/abs/2605.24710v1
Date: Sat, 23 May 2026 19:26:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-26 19:50:18.340619
Title: Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit
Title（参考訳）: $μ$P以下の広域ニューラルネットワークにおける特徴学習:平均場限界の同定可能性とスパース辞書分解
Authors: Akmal Xodarev,
Abstract要約: 雑音勾配降下の平均場限界のグローバルな存在と一意性を$P以下で証明する。平均フィールド限界の識別可能性の特徴付けを行う。特徴・学習・エラーの総分解を統計的・最適化・カオス伝播・疎残成分に導出する。
参考スコア（独自算出の注目度）: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We establish four structural results for feature learning in wide two-layer neural networks under the Maximal Update Parametrization ($μ$P). First, we prove global existence and uniqueness of the mean-field limit of noisy gradient descent under $μ$P, identifying the maximal admissible weight $w^*$ on the moment sequence of the initialization as the reciprocal parameter-moment-growth boundary, and hence the largest weighted moment class propagated by the flow. The finite-particle approximation has uniform-in-time squared-Wasserstein rate $O(N^{-1})$. Second, we characterize identifiability of the mean-field limit: two admissible parameter measures induce the same network function in $L^2$ exactly when their active components agree modulo the finite-rank realization symmetry of the architecture. The orbit depth $D^*_{\mathrm{orb}}$ is separated from the moment-variety depth $D^*_{\mathrm{var}}$. Third, under the Barron-Hermite target condition the active support of the long-time limit measure admits a sparse-dictionary decomposition: it is supported on at most $S^*$ atoms modulo finite-rank realization symmetry, with $S^*$ bounded by an explicit coefficient-threshold number. Fourth, we derive the total feature-learning-error decomposition into statistical, optimization, propagation-of-chaos, and sparse-residual components, with a target-dependent Hermite/Barron tail replacing any initialization-only residual. The four results are tied together by an architectural identity: the triple $(w^*, D^*_{\mathrm{orb}}, S^*)$ -- the maximal admissible weight, the orbit identifiability depth, and the sparse-dictionary depth at which the target is realizable -- is the natural learning cell of the architecture-data pair $(σ, ρ)$. The proofs are self-contained except for standard results from $μ$P and mean-field Langevin theory.
Abstract（参考訳）: 我々は、最大更新パラメトリゼーション(μ$P)の下で、広い2層ニューラルネットワークにおける特徴学習のための4つの構造的結果を確立する。まず,最大許容量$w^*$を相反パラメータ-モーメント-成長境界として初期化のモーメント列上で同定し,フローによって伝播する最大重み付きモーメントクラスを同定し,雑音勾配降下の平均場限界をμ$P以下で証明する。有限粒子近似は、一様時間二乗ワッサーシュタインレート$O(N^{-1})$を持つ。 2つの許容パラメータ測度は、それらの活性成分がアーキテクチャの有限ランク実現対称性を変調するとき、正確に$L^2$で同じネットワーク関数を誘導する。軌道深さ $D^*_{\mathrm{orb}}$ はモーメント変数深さ $D^*_{\mathrm{var}}$ から分離される。第3に、バロン・ハーマイト目標条件の下では、長時間の極限測度のアクティブな支持はスパースディクショナリー分解(英語版)(sparse-dictionary decomposition)を許容する:少なくとも$S^*$原子をモジュロ有限ランク実現対称性(英語版)(modulo finite-rank realization symmetric)でサポートし、明示的な係数-閾値数で有界な$S^*$を持つ。第4に,特徴-学習-エラーの総分解を統計的,最適化,カオスの伝播,疎残留成分に導出し,初期化のみの残留物を置き換えるターゲット依存型Hermite/Barronテールを作成した。 3つの$(w^*, D^*_{\mathrm{orb}}, S^*)$ -- 最大許容重量、軌道の識別可能性深さ、ターゲットが実現可能なスパース次元深さ -- はアーキテクチャデータ対$(σ, ρ)$の自然学習セルである。証明は、$μ$Pと平均場ランゲヴィン理論の標準結果を除いて自己完結である。

論文の概要: Feature Learning in Wide Neural Networks under $μ$P: Identifiability and Sparse-Dictionary Decomposition of the Mean-Field Limit

関連論文リスト