Fugu-MT 論文翻訳(概要): Muon Learns More Robust and Transferable Features than Adam

論文の概要: Muon Learns More Robust and Transferable Features than Adam

arxiv url: http://arxiv.org/abs/2606.09658v1
Date: Mon, 08 Jun 2026 15:42:54 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-09 14:42:07.476298
Title: Muon Learns More Robust and Transferable Features than Adam
Title（参考訳）: MuonがAdamよりもロバストでトランスファー可能な機能を学ぶ
Authors: Tianyu Ruan, Fengzhuo Zhang, Shuche Wang, Shihua Zhang,
Abstract要約: Muon が学んだ機能は,Adam と SGD が学んだ機能よりも一貫して堅牢であることを示す。また,Muonで学習した特徴がAdamやSGDが学んだ特徴よりも効果的に伝達できることを実証した。
参考スコア（独自算出の注目度）: 24.749527924478148
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Muon has recently emerged as a state-of-the-art optimizer for pretraining Large Language Models (LLMs) and vision classifiers. Despite its efficiency advantage over Adam and SGD, the feature-learning advantage of Muon remains unclear. This paper investigates Muon's feature-learning advantage through the lens of robustness and transferability. First, by evaluating pretrained models on corrupted images and texts, we show that features learned by Muon are consistently more robust than those learned by Adam and SGD across different architectures, including transformers and Convolutional Neural Networks (CNNs). Using trained layer-wise probes, we further show that this robustness advantage is reflected in larger logit margins across layers. Second, by training linear classifiers or fine-tuning full models from pretrained parameters on downstream tasks, we demonstrate that Muon-learned features transfer more effectively than those learned by Adam and SGD. This transferability advantage is further supported by the diversity of hidden states across layers, as measured by effective rank. Finally, in a representative classification problem with multi-component features, we prove that Muon attains larger margins and higher effective rank than Adam and SGD, providing theoretical support for our empirical findings.
Abstract（参考訳）: Muonは最近、LLM(Large Language Models)と視覚分類器を事前訓練するための最先端のオプティマイザとして登場した。効率性はAdamやSGDよりも優れているが、Muonの機能学習の利点は未だに不明である。本稿では,ムオンの特徴学習の優位性について,頑健さと伝達可能性のレンズを用いて検討する。まず、劣化した画像やテキストの事前学習モデルを評価することにより、Muonが学んだ機能は、トランスフォーマーや畳み込みニューラルネットワーク(CNN)など、さまざまなアーキテクチャで学んだものよりも一貫して堅牢であることを示す。さらに, このロバスト性優位性は, 層間におけるロジットマージンの増大に反映されることが示唆された。第2に、下流タスクの事前学習パラメータから線形分類器や微調整完全モデルを訓練することにより、Muon学習した特徴がAdamやSGDが学んだものよりも効果的に伝達できることを実証する。この伝達可能性の利点は、効果的なランクによって測定されるように、層間の隠れ状態の多様性によってさらに支持される。最後に、多成分特徴を持つ代表的分類問題において、MuonがAdamやSGDよりも大きなマージンと高い有効ランクを達成でき、我々の経験的発見を理論的に裏付けるものであることを証明した。

論文の概要: Muon Learns More Robust and Transferable Features than Adam

関連論文リスト