Fugu-MT 論文翻訳(概要): Muon is Not That Special: Random or Inverted Spectra Work Just as Well

論文の概要: Muon is Not That Special: Random or Inverted Spectra Work Just as Well

arxiv url: http://arxiv.org/abs/2605.11181v1
Date: Mon, 11 May 2026 19:42:48 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 21:48:56.391799
Title: Muon is Not That Special: Random or Inverted Spectra Work Just as Well
Title（参考訳）: Muonは特別ではない: ランダムか逆スペクトルは同じように機能する
Authors: Zakhar Shumaylov, Nathaël Da Costa, Peter Zaika, Bálint Mucsányi, Alex Massucco, Yoav Gelberg, Carola-Bibiane Schönlieb, Yarin Gal, Philipp Hennig,
Abstract要約: 正確な幾何構造が性能に影響を及ぼす重要な要因ではないことを実証する。ここでは、Schatten (quasi-size)normsに基づく幾何学のファミリであるFreonを紹介する。また、特異値をランダムノイズで置き換える不条理なKaonを導入する。
参考スコア（独自算出の注目度）: 50.969177887027115
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The recent empirical success of the Muon optimizer has renewed interest in non-Euclidean optimization, typically justified by similarities with second-order methods, and linear minimization oracle (LMO) theory. In this paper, we challenge this geometric narrative through three contributions, demonstrating that precise geometric structure is not the key factor affecting optimization performance. First, we introduce Freon, a family of optimizers based on Schatten (quasi-)norms, powered by a novel, provably optimal QDWH-based iterative approximation. Freon naturally interpolates between SGD and Muon, while smoothly extrapolating into the quasi-norm regime. Empirically, the best-performing Schatten parameters for GPT-2 lie strictly within the quasi-norm regime, and thus cannot be represented by any unitarily invariant LMO. Second, noting that Freon performs well across a wide range of exponents, we introduce Kaon, an absurd optimizer that replaces singular values with random noise. Despite lacking any coherent geometric structure, Kaon matches Muon's performance and retains classical convergence guarantees, proving that strict adherence to a precise geometry is practically irrelevant. Third, having shown that geometry is not the primary driver of performance, we demonstrate it is instead controlled by two local quantities: alignment and descent potential. Ultimately, each optimizer must tune its step size around these two quantities. While their dynamics are difficult to predict a-priori, evaluating them within a stochastic random feature model yields a precise insight: Muon succeeds not by tracking an ideal global geometry, but by guaranteeing step-size optimality.
Abstract（参考訳）: ミュオン最適化の最近の経験的成功により、非ユークリッド最適化への関心が再燃し、典型的には二階法と類似性や線形最小化オラクル(LMO)理論によって正当化された。本稿では,3つのコントリビューションを通じて,この幾何学的物語に挑戦し,正確な幾何学的構造が最適化性能に影響を及ぼす重要な要因ではないことを示す。まず、Schatten(quasi-)normsに基づく最適化アルゴリズムのファミリーであるFreonを紹介する。フロンは自然にSGDとムオンの間を補間し、準ノルム状態に滑らかに外挿する。経験的に、GPT-2 の最も優れたシャッテンパラメータは準ノルム状態内にあるため、単位不変な LMO で表すことはできない。第二に、Freonが幅広い指数でうまく機能していることに留意し、特異値をランダムノイズで置き換える不条理な最適化器であるKaonを導入する。コヒーレントな幾何学構造が欠如しているにもかかわらず、カオンはムオンのパフォーマンスと一致し、古典的な収束保証を維持しており、正確な幾何学への厳密な固執は事実上無関係であることを証明している。第三に、幾何が性能の第一の要因ではないことを証明し、その代わりに2つの局所的な量、すなわちアライメントと降下ポテンシャルによって制御されることを示した。最終的に、各オプティマイザは、これらの2つの量の周りにステップサイズを調整しなければなりません。それらの力学はa-プリオリを予測するのが難しいが、確率的ランダムな特徴モデル内でそれらを評価すると、正確な洞察が得られる: ムーンは理想的な大域幾何学を追跡するのではなく、ステップサイズの最適性を保証することによって成功する。

論文の概要: Muon is Not That Special: Random or Inverted Spectra Work Just as Well

関連論文リスト