Fugu-MT 論文翻訳(概要): Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

論文の概要: Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

arxiv url: http://arxiv.org/abs/2605.19282v1
Date: Tue, 19 May 2026 03:00:26 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-20 15:03:09.08746
Title: Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR
Title（参考訳）: プレトレーニングを超えてムーンを再考する - VLAとRLVRのスペクトル障害と高コスト対策
Authors: Chongyu Fan, Gaowen Liu, Mingyi Hong, Ramana Rao Kompella, Sijia Liu,
Abstract要約: ミューオン (Muon) は、運動量行列の特異値を 1 に向けて駆動することでスペクトル勾配化を強制する行列対応反復である。 Pionは、均一なスペクトル白化を2段階のProgress+Suppressionメカニズムに置き換えながら、その計算効率を維持するMuonの代替品である。
参考スコア（独自算出の注目度）: 37.36050721989701
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Muon is a matrix-aware optimizer that leverages Newton-Schulz (NS) iterations to enforce spectral gradient orthogonalization by driving all singular values of the momentum matrix toward 1. While this uniform spectral whitening enhances exploration and outperforms AdamW in LLM pretraining, we show it could lead to fundamental limitations beyond pretraining in two regimes: (i) cross-modality vision-language-action (VLA) training, where inherently low-rank action-module gradients cause amplification of noisy tail directions, and (ii) reinforcement learning with verifiable rewards (RLVR), where low-SNR gradients and the need to preserve per-head specialization from prior training make whitening unstable. To address these challenges, we propose Pion, a drop-in replacement for Muon that preserves its computational efficiency while replacing uniform spectral whitening with a two-stage Promotion+Suppression mechanism, which we call the high-pass NS iteration. This design induces a sharp spectral high-pass effect, anchoring dominant singular values at 1 while suppressing noisy tail components toward 0, with controllable filter strength. To preserve pretrained per-head heterogeneity, Pion also supports a per-head mode that applies updates independently across attention heads via a simple reshape, at no extra cost. In VLA training on LIBERO and LIBERO-Plus, Pion consistently outperforms both baselines across l_1-regression (VLA-Adapter) and flow-matching (VLANeXt) architectures, e.g., reaching 100% success rate on LIBERO Object after 1,500 training steps with VLA-Adapter, vs. 97.0% for Muon and only 32.2% for AdamW. The advantage of Pion further extends to a real Franka Research 3 robot with a pi_0.5 backbone under the DROID setup on three grasp-and-place tasks. In RLVR post-training on Qwen3-1.7B/4B with GRPO and GMPO, Pion also outperforms AdamW on MATH and GSM8K while Muon collapses to zero.
Abstract（参考訳）: Muon は、Newton-Schulz (NS) の反復を利用して、運動量行列のすべての特異値を 1 へ駆動することで、スペクトル勾配直交を強制する行列対応最適化器である。この均一なスペクトル白化は、LLMプレトレーニングにおけるAdamWの探索と性能を向上するが、2つのレギュレーションにおける事前トレーニング以上の根本的な制限をもたらす可能性があることを示す。 (i)クロスモダリティ・ヴィジュアル・ランゲージ・アクション(VLA)トレーニングでは、本質的に低ランクなアクション・モジュール勾配がノイズの尾方向の増幅を引き起こす。二検証可能な報酬付き強化学習(RLVR)では、低SNR勾配と前訓練から頭部ごとの専門性を維持する必要性が白化を不安定にする。これらの課題に対処するために、均一なスペクトル白化を2段階のProgress+Suppressionメカニズムに置き換えつつ、計算効率を保ちつつ、ムーンのドロップイン置換であるPionを提案し、これをハイパスNS繰り返しと呼ぶ。この設計はシャープなスペクトルハイパス効果を誘導し、支配的な特異値を1に固定し、ノイズの多い尾成分を0に抑え、フィルタ強度を制御可能である。プレトレーニングされたヘッド毎の不均一性を維持するために、Pionは、単純なリシェープによって、ヘッド毎の更新を独立して適用する、追加のコストなしで、ヘッド毎のモードもサポートしている。 LIBERO と LIBERO-Plus での VLA トレーニングでは、Pion は l_1-regression (VLA-Adapter) と flow-matching (VLANeXt) アーキテクチャの両ベースラインを一貫して上回り、例えば VLA-Adapter で 1,500 のトレーニングステップを経て LIBERO Object で100%の成功率に達した後、Muon では 97.0%、AdamW では 32.2% に留まった。 Pionの利点は、3つのグリップ・アンド・プレイス・タスクでDROIDセットアップの下でpi_0.5バックボーンを備えた本物のFranka Research 3ロボットにさらに拡張される。 Qwen3-1.7B/4BとGRPOとGMPOのRLVRポストトレーニングでは、MTHとGSM8KでAdamWを上回り、Mumonは0に崩壊する。

論文の概要: Rethinking Muon Beyond Pretraining: Spectral Failures and High-Pass Remedies for VLA and RLVR

関連論文リスト