Fugu-MT 論文翻訳(概要): MuCon: Clipped Muon Updates for LLM Training

論文の概要: MuCon: Clipped Muon Updates for LLM Training

arxiv url: http://arxiv.org/abs/2605.26459v1
Date: Tue, 26 May 2026 02:16:17 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-27 17:51:41.576817
Title: MuCon: Clipped Muon Updates for LLM Training
Title（参考訳）: MuCon: Clipped Muon Updates for LLM Training
Authors: Albert Yi,
Abstract要約: Muon-style は行列値の運動量または事前条件の更新$B = U operatornamediag(_1,ldots,_r) Vtop$ を、標準偏極係数 $operatornamePol(B) = U Vtop$ に置き換える。 MuCon は、同じ Muon 行列に対して特異値クリッピングを適用し、$DmathrmMuCon_(B) = operatornameMClip_(B) = U である。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Muon-style optimizers take a matrix-valued momentum or preconditioned update $B = U \operatorname{diag}(σ_1,\ldots,σ_r) V^\top$ and replace it with its canonical partial polar factor $\operatorname{Pol}(B) = U V^\top$. This maps every nonzero singular value to one. MuCon is the clipped-Muon variant studied here: it applies singular-value clipping to the same Muon matrix, $D^{\mathrm{MuCon}}\_τ(B) = \operatorname{MClip}\_τ(B) = U \operatorname{diag}\bigl(\min\{σ\_i,τ\}\bigr) V^\top, \qquad τ> 0$. Thus, $\operatorname{MClip}\_τ$ denotes the mathematical clipping operator, while MuCon denotes the optimizer primitive that substitutes this clipped direction for Muon's polar direction. The Muon/MuCon scaling parameterization used in this work is called $\text{SpectralP}$: it is the hidden-matrix scaling recipe under which polar Muon or clipped MuCon directions are applied. The map $\operatorname{MClip}\_τ$ is the Frobenius projection onto the spectral-norm ball $\{X : \|X\|_2 \le τ\}$: it leaves singular values at or below $τ$ unchanged and modifies only the violating singular directions. This paper asks when the MuCon clipping step can be approximated without a full dense SVD. We record two exact identities, a polar/absolute-value formula and a scalar-root formulation leading to a rational Newton filter for the clipped positive-semidefinite factor, and identify the numerical obstruction common to both: singular values near the threshold make sign decisions and rational solves ill-conditioned. Matrix-function methods are therefore useful only when paired with stable polar/square-root primitives or explicit regularization near the clipping boundary.
Abstract（参考訳）: ミューオン式オプティマイザは行列値の運動量または事前条件の更新$B = U \operatorname{diag}(σ_1,\ldots,σ_r) V^\top$ を、標準偏極係数 $\operatorname{Pol}(B) = U V^\top$ に置き換える。これはすべての 0 でない特異値を 1 にマッピングする。ミューコンは、同じミューオン行列に対して特異値クリッピングを適用する:$D^{\mathrm{MuCon}}\_τ(B) = \operatorname{MClip}\_τ(B) = U \operatorname{diag}\bigl(\min\{σ\_i,τ\}\bigr) V^\top, \qquad τ> 0$。したがって、$\operatorname{MClip}\_τ$ は数学的クリッピング演算子を表し、 MuCon は、このクリッピングされた方向を Muon の極方向に置き換えるオプティマイザプリミティブを表す。この研究で使用される Muon/MuCon スケーリングパラメータ化は $\text{SpectralP}$: 極性の Muon や切断された MuCon の方向が適用される隠れ行列スケーリングレシピである。写像 $\operatorname{MClip}\_τ$ はスペクトル-ノルム球 $\{X : \|X\|_2 \le τ\}$ へのフロベニウス射影である: 特異値は$τ$ 以下に残され、違反する特異方向のみを変更する。本稿では,全密度SVDを使わずに MuCon クリッピングステップを近似できるかどうかを問う。正の正定値係数に対する合理的ニュートンフィルタに導かれる極/絶対値式とスカラー・ルート式という2つの厳密な同一性を記録し、閾値付近の特異値が符号決定を行い、不当な条件を合理的に解く。したがって、行列関数法は、安定極/平方根プリミティブとペアリングしたり、クリッピング境界付近で明示的な正規化を行う場合にのみ有用である。

論文の概要: MuCon: Clipped Muon Updates for LLM Training

関連論文リスト