Fugu-MT 論文翻訳(概要): MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization

論文の概要: MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization

arxiv url: http://arxiv.org/abs/2606.17526v1
Date: Tue, 16 Jun 2026 05:10:29 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-17 17:15:32.281107
Title: MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization
Title（参考訳）: MGUP:確率最適化のためのMomentum-Gradient Alignment Update Policy
Authors: Da Chang, Ganzhao Yuan,
Abstract要約: 選択的更新のための新しいメカニズムである textbfMGUP を提案する。 textbfMGUPは、より大きなステップサイズを適用することで、標準モーメントベースの比率を増大させる。 textbfMGUPはAdamW、Lion、Muonとシームレスに統合される。
参考スコア（独自算出の注目度）: 21.42805615044331
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Efficient optimization is essential for training large language models. Although intra-layer selective updates have been explored, a general mechanism that enables fine-grained control while ensuring convergence guarantees is still lacking. To bridge this gap, we propose \textbf{MGUP}, a novel mechanism for selective updates. \textbf{MGUP} augments standard momentum-based optimizers by applying larger step-sizes to a selected fixed proportion of parameters in each iteration, while applying smaller, non-zero step-sizes to the rest. As a nearly {plug-and-play} module, \textbf{MGUP} seamlessly integrates with optimizers such as AdamW, Lion, and Muon. This yields powerful variants such as \textbf{MGUP-AdamW}, \textbf{MGUP-Lion}, and \textbf{MGUP-Muon}. Under standard assumptions, we provide theoretical convergence guarantees for \textbf{MGUP-AdamW} (without weight decay) in stochastic optimization. Extensive experiments across diverse tasks, including MAE pretraining, LLM pretraining, and downstream fine-tuning, demonstrate that our \textbf{MGUP}-enhanced optimizers achieve superior or more stable performance compared to their original base optimizers. We offer a principled, versatile, and theoretically grounded strategy for efficient intra-layer selective updates, accelerating and stabilizing the training of large-scale models. The code is publicly available at https://github.com/MaeChd/MGUP.
Abstract（参考訳）: 大規模言語モデルのトレーニングには,効率的な最適化が不可欠だ。層内選択的な更新が検討されているが、収束保証を確保しつつきめ細かな制御を可能にする一般的なメカニズムはいまだに欠如している。このギャップを埋めるために、選択的な更新のための新しいメカニズムである \textbf{MGUP} を提案する。 \textbf{MGUP} は、各イテレーションにおいて、選択された固定されたパラメータの割合により大きなステップサイズを適用し、残りのイテレーションに小さなゼロでないステップサイズを適用することで、標準運動量ベースのオプティマイザを強化する。ほぼ {plug-and-play} モジュールである \textbf{MGUP} は、AdamW, Lion, Muon などのオプティマイザとシームレスに統合される。これにより \textbf{MGUP-AdamW}, \textbf{MGUP-Lion}, \textbf{MGUP-Muon} のような強力な変種が得られる。標準的な仮定の下では、確率的最適化において(重み付けを伴わない) textbf{MGUP-AdamW} に対して理論的収束を保証する。 MAEプリトレーニング、LLMプリトレーニング、ダウンストリーム微調整を含む多種多様なタスクにわたる広範囲な実験により、我々の \textbf{MGUP} 強化オプティマイザは、元のベースオプティマイザよりも優れた、あるいはより安定したパフォーマンスを達成することを示した。我々は,大規模モデルのトレーニングを加速し,安定化する,効率的な層内選択更新のための原理的,汎用的,理論的に基礎的な戦略を提供する。コードはhttps://github.com/MaeChd/MGUPで公開されている。

論文の概要: MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization

関連論文リスト