Fugu-MT 論文翻訳(概要): HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

論文の概要: HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

arxiv url: http://arxiv.org/abs/2603.10067v1
Date: Tue, 10 Mar 2026 02:12:24 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-12 16:22:32.608633
Title: HTMuon: Improving Muon via Heavy-Tailed Spectral Correction
Title（参考訳）: HTMuon:ヘビープレート分光補正によるムオン改善
Authors: Tianyu Pang, Yujie Fang, Zihang Liu, Shenyang Deng, Lei Hsiung, Shuhua Yu, Yaoqing Yang,
Abstract要約: ムオンの更新規則は重み付け重量の出現を抑制し、トレーニングの方向性を過度に強調する。事前学習と画像分類の実験により、HTMuonは最先端のベースラインよりも一貫して性能を向上することが示された。
参考スコア（独自算出の注目度）: 33.68909424458072
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Muon has recently shown promising results in LLM training. In this work, we study how to further improve Muon. We argue that Muon's orthogonalized update rule suppresses the emergence of heavy-tailed weight spectra and over-emphasizes the training along noise-dominated directions. Motivated by the Heavy-Tailed Self-Regularization (HT-SR) theory, we propose HTMuon. HTMuon preserves Muon's ability to capture parameter interdependencies while producing heavier-tailed updates and inducing heavier-tailed weight spectra. Experiments on LLM pretraining and image classification show that HTMuon consistently improves performance over state-of-the-art baselines and can also serve as a plug-in on top of existing Muon variants. For example, on LLaMA pretraining on the C4 dataset, HTMuon reduces perplexity by up to $0.98$ compared to Muon. We further theoretically show that HTMuon corresponds to steepest descent under the Schatten-$q$ norm constraint and provide convergence analysis in smooth non-convex settings. The implementation of HTMuon is available at https://github.com/TDCSZ327/HTmuon.
Abstract（参考訳）: Muonは最近、LLMトレーニングの有望な結果を示している。本研究では,Muonをさらに改良する方法について検討する。我々は,Muonの直交更新規則が重み付き重み付きスペクトルの出現を抑制し,騒音支配方向に沿ったトレーニングを過度に強調すると主張している。 HT-SR理論を動機として,HTMuonを提案する。 HTMuonは、重い尾の更新を発生させ、重い尾の重みのスペクトルを誘導しながら、パラメータ相互依存性を捕捉するMuonの能力を保っている。 LLM事前訓練と画像分類の実験により、HTMuonは最先端のベースラインよりも一貫して性能を改善し、既存のMuonのプラグインとしても機能することが示された。例えば、C4データセットでのLLaMA事前トレーニングでは、HTMuonはMuonと比較して、パープレキシティを最大0.98ドル削減する。さらに理論的には、HTMuonは、Schatten-$q$ノルム制約の下で最も急降下に対応し、滑らかな非凸条件下で収束解析を提供する。 HTMuonの実装はhttps://github.com/TDCSZ327/HTmuonで公開されている。

論文の概要: HTMuon: Improving Muon via Heavy-Tailed Spectral Correction

関連論文リスト