Fugu-MT 論文翻訳(概要): LiMuon: Light and Fast Muon Optimizer for Large Models

論文の概要: LiMuon: Light and Fast Muon Optimizer for Large Models

arxiv url: http://arxiv.org/abs/2509.14562v1
Date: Thu, 18 Sep 2025 02:49:27 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-19 17:26:53.036029
Title: LiMuon: Light and Fast Muon Optimizer for Large Models
Title（参考訳）: LiMuon: 大規模モデルのための軽量かつ高速なミューオン最適化
Authors: Feihu Huang, Yuning Luo, Songcan Chen,
Abstract要約: 大規模モデルのトレーニングに有用なMuonを提案する。私たちのLiMuonは、現在のMuonとその変種よりもメモリが低い。一般化された滑らかな条件下でLiMuonがサンプルO(epsilon-3)$であることを証明する。
参考スコア（独自算出の注目度）: 45.11415579822849
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large models recently are widely applied in artificial intelligence, so efficient training of large models has received widespread attention. More recently, a useful Muon optimizer is specifically designed for matrix-structured parameters of large models. Although some works have begun to studying Muon optimizer, the existing Muon and its variants still suffer from high sample complexity or high memory for large models. To fill this gap, we propose a light and fast Muon (LiMuon) optimizer for training large models, which builds on the momentum-based variance reduced technique and randomized Singular Value Decomposition (SVD). Our LiMuon optimizer has a lower memory than the current Muon and its variants. Moreover, we prove that our LiMuon has a lower sample complexity of $O(\epsilon^{-3})$ for finding an $\epsilon$-stationary solution of non-convex stochastic optimization under the smooth condition. Recently, the existing convergence analysis of Muon optimizer mainly relies on the strict Lipschitz smooth assumption, while some artificial intelligence tasks such as training large language models (LLMs) do not satisfy this condition. We also proved that our LiMuon optimizer has a sample complexity of $O(\epsilon^{-3})$ under the generalized smooth condition. Numerical experimental results on training DistilGPT2 and ViT models verify efficiency of our LiMuon optimizer.
Abstract（参考訳）: 近年、大規模モデルは人工知能に広く応用されているため、大規模モデルの効率的な訓練は広く注目を集めている。最近では、有用なMuonオプティマイザが、大規模モデルの行列構造パラメータ用に特別に設計されている。 Muonオプティマイザの研究はいくつかの研究が始まっているが、既存の Muon とその変種はいまだに大規模なモデルで高いサンプル複雑性や高いメモリに悩まされている。このギャップを埋めるために,運動量に基づく分散低減技術とランダム化特異値分解(SVD)に基づく,大規模モデルのトレーニングのための軽量かつ高速なMuon(LiMuon)オプティマイザを提案する。我々のLiMuonオプティマイザは、現在のMuonとその変種よりもメモリが低い。さらに、我々のLiMuonは、滑らかな条件下での非凸確率最適化の$O(\epsilon^{-3})$-定常解を求めるために、より低いサンプル複雑性を持つことを証明している。近年,Muonオプティマイザの既存の収束解析は主に厳密なリプシッツの滑らかな仮定に依存しているが,大規模言語モデル(LLM)の訓練などの人工知能タスクはこの条件を満たしていない。また、LiMuonオプティマイザは一般化された滑らかな条件下では$O(\epsilon^{-3})$のサンプル複雑性を持つことを示した。 DistilGPT2 と ViT モデルの訓練実験により,LiMuon オプティマイザの有効性が検証された。

論文の概要: LiMuon: Light and Fast Muon Optimizer for Large Models

関連論文リスト