Fugu-MT 論文翻訳(概要): FISMO: Fisher-Structured Momentum-Orthogonalized Optimizer

論文の概要: FISMO: Fisher-Structured Momentum-Orthogonalized Optimizer

arxiv url: http://arxiv.org/abs/2601.21750v1
Date: Thu, 29 Jan 2026 14:05:04 GMT
ステータス: 翻訳完了
システム内更新日: 2026-01-30 16:22:49.877038
Title: FISMO: Fisher-Structured Momentum-Orthogonalized Optimizer
Title（参考訳）: FISMO:フィジカル構造化モーメントオルソゴン化最適化器
Authors: Chenrui Xu, Wenjing Yan, Ying-Jun Angela Zhang,
Abstract要約: 我々は、フィッシャー情報幾何を通して異方性ニューロトロピックな幾何情報を含むFISMOを紹介する。 FISMOは、確立されたベースラインよりも優れた効率と最終性能を達成する。
参考スコア（独自算出の注目度）: 30.184978506988767
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training large-scale neural networks requires solving nonconvex optimization where the choice of optimizer fundamentally determines both convergence behavior and computational efficiency. While adaptive methods like Adam have long dominated practice, the recently proposed Muon optimizer achieves superior performance through orthogonalized momentum updates that enforce isotropic geometry with uniform singular values. However, this strict isotropy discards potentially valuable curvature information encoded in gradient spectra, motivating optimization methods that balance geometric structure with adaptivity. We introduce FISMO (Fisher-Structured Momentum-Orthogonalized) optimizer, which generalizes isotropic updates to incorporate anisotropic curvature information through Fisher information geometry. By reformulating the optimizer update as a trust-region problem constrained by a Kronecker-factored Fisher metric, FISMO achieves structured preconditioning that adapts to local loss landscape geometry while maintaining computational tractability. We establish convergence guarantees for FISMO in stochastic nonconvex settings, proving an $\mathcal{O}(1/\sqrt{T})$ rate for the expected squared gradient norm with explicit characterization of variance reduction through mini-batching. Empirical evaluation on image classification and language modeling benchmarks demonstrates that FISMO achieves superior training efficiency and final performance compared to established baselines.
Abstract（参考訳）: 大規模ニューラルネットワークのトレーニングには、オプティマイザの選択が収束挙動と計算効率の両方を根本的に決定する非凸最適化の解決が必要である。アダムのような適応的手法は長年に渡り実践を独占してきたが、最近提案されたミューオン最適化器は、一様特異値を持つ等方的幾何を強制する直交運動量更新によって優れた性能を達成する。しかし、この厳密な等方性は勾配スペクトルで符号化された潜在的に価値のある曲率情報を排除し、幾何構造と適応性のバランスをとる最適化手法を動機付けている。 FISMO(Fisher-Structured Momentum-Orthogonalized)オプティマイザ(Fisher-Structued Momentum-Orthogonalized)を導入する。 Kronecker-factored Fisher 計量によって制約された信頼領域問題としてオプティマイザ更新を再構成することにより、FISMOは計算的トラクタビリティを維持しながら、局所的なロスランドスケープ形状に適応する構造化プレコンディショニングを実現する。確率的非凸設定におけるFISMOの収束保証を確立し、ミニバッチによる分散還元の明示的な特徴を持つ期待二乗勾配ノルムに対する$\mathcal{O}(1/\sqrt{T})$レートを証明した。画像分類と言語モデルベンチマークの実証評価により、FISMOは確立されたベースラインよりも訓練効率と最終性能が優れていることが示された。

論文の概要: FISMO: Fisher-Structured Momentum-Orthogonalized Optimizer

関連論文リスト