Fugu-MT 論文翻訳(概要): OptMuon: Closed-Loop Orthogonalized Momentum Methods for Stochastic Optimization with Zero-Noise Optimality

論文の概要: OptMuon: Closed-Loop Orthogonalized Momentum Methods for Stochastic Optimization with Zero-Noise Optimality

arxiv url: http://arxiv.org/abs/2606.08783v1
Date: Sun, 07 Jun 2026 18:59:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:06.445949
Title: OptMuon: Closed-Loop Orthogonalized Momentum Methods for Stochastic Optimization with Zero-Noise Optimality
Title（参考訳）: OptMuon:ゼロノイズ最適化のための閉ループ直交モーメント法
Authors: Ganzhao Yuan,
Abstract要約: 閉ループスカラー運動量を示す。最適化はムオン型運動量と組み合わせることができる。雑音適応性とゼロノイズ最適性を対数因子まで保ちながら最適化する。これらの結果は,OptMuon-Aがノイズレートを達成することを示す。 (T-1/2+1/2T-1/2)を平均滑らかに、OptMuon-Iをノイズレートとする。 (T-1/2+)
参考スコア（独自算出の注目度）: 23.28384210732827
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Orthogonalized momentum updates, as used in Muon-style optimizers, have recently shown strong empirical stability in large-scale deep learning. However, existing orthogonalized methods are typically paired with constant or open-loop magnitude rules, and therefore do not explicitly calibrate their update magnitudes from the observed optimization trajectory. Motivated by the closed-loop perspective behind Lipschitz-free and noise-adaptive methods, we propose OptMuon, a family of adaptive momentum orthogonalization methods for stochastic nonconvex optimization. OptMuon combines Muon-style polar-factor directions with a trajectory-dependent AdaGrad-Norm-type coefficient schedule, so that the update magnitude is determined by the observed gradient and momentum history rather than by a prescribed Lipschitz-dependent rule. The schedule does not use the smoothness constant, the variance level, or the bounded-gradient constant in parameter selection, and its running-maximum correction prevents isolated gradient spikes from causing excessive coefficient collapse. Under lower-boundedness, unbiased stochastic gradients with bounded variance, smoothness, and an almost-sure bounded stochastic-gradient condition, we prove two complementary guarantees. OptMuon-A achieves the noise-adaptive rate \(\tilde{\mathcal O}(T^{-1/2}+σ^{1/2}T^{-1/4})\) under average smoothness, while OptMuon-I achieves \(\tilde{\mathcal O}(T^{-1/2}+σ^{1/3}T^{-1/3})\) under individual smoothness. In the zero-noise regime, both bounds automatically reduce to a nearly optimal deterministic first-order rate \(\tilde{\mathcal O}(T^{-1/2})\) without manual hyperparameter retuning. These results show that closed-loop scalar adaptation can be combined with Muon-style momentum orthogonalization while retaining noise adaptivity and zero-noise optimality up to logarithmic factors.
Abstract（参考訳）: 直交運動量更新は、Muonスタイルのオプティマイザで使われているが、近年、大規模ディープラーニングにおいて強い経験的安定性を示している。しかし、既存の直交化法は、通常、定数または開ループの等級ルールとペアリングされるため、観測された最適化軌道から更新の等級を明示的に校正することはない。リプシッツフリーおよび雑音適応法の背後にある閉ループの視点に触発され、確率的非凸最適化のための適応運動量直交法の一つであるOptMuonを提案する。 OptMuonは、Muonスタイルの極性係数方向と軌道依存のAdaGrad-Norm型係数スケジュールを組み合わせることで、更新度は、所定のリプシッツ依存規則ではなく、観測された勾配と運動量履歴によって決定される。スケジュールはパラメータ選択において、滑らか性定数、分散レベル、および有界勾配定数を使用しず、そのランニング・最大補正は、孤立した勾配スパイクが過度な係数崩壊を引き起こすのを防ぐ。下界性,非バイアス性確率勾配,有界分散,滑らか性,およびほぼ有界な確率勾配条件の下では,2つの相補的保証が証明される。 OptMuon-A は平均滑らか度で雑音適応率 \(\tilde{\mathcal O)(T^{-1/2}+σ^{1/2}T^{-1/4})\) 、一方 OptMuon-I は個別滑らか度で \(\tilde{\mathcal O}(T^{-1/2}+σ^{1/3}T^{-1/3})\) を達成する。ゼロノイズ系では、両方の境界は、手動のハイパーパラメータ再構成なしで、ほぼ最適な決定論的1次速度 \(\tilde{\mathcal O}(T^{-1/2})\) に自動的に減少する。これらの結果から、閉ループスカラー適応は、雑音適応性とゼロノイズ最適性を対数因子まで保ちながら、ムオン型運動量直交化と組み合わせることができることがわかった。

論文の概要: OptMuon: Closed-Loop Orthogonalized Momentum Methods for Stochastic Optimization with Zero-Noise Optimality

関連論文リスト