Fugu-MT 論文翻訳(概要): Can Muon Fine-tune Adam-Pretrained Models?

論文の概要: Can Muon Fine-tune Adam-Pretrained Models?

arxiv url: http://arxiv.org/abs/2605.10468v1
Date: Mon, 11 May 2026 12:34:20 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-12 23:28:50.808922
Title: Can Muon Fine-tune Adam-Pretrained Models?
Title（参考訳）: ムーンファインチューン・アダム予知モデルは可能か?
Authors: Xingyu Qu, Peigeng Huang, Samuel Horvath,
Abstract要約: ほとんどのオープンモデルはAdamで事前訓練されており、微調整のためにミュオンに鼻で切り替えると、ミスマッチによって性能が劣化する。我々は、ミスマッチが事前訓練された知識を妨害し、この破壊が更新強度とともにスケールする証拠を提供する。 LoRAは、言語やビジョンタスク全体にわたって、完全な微調整の下で観察されるAdamとMuonのパフォーマンスギャップを減らします。
参考スコア（独自算出の注目度）: 0.5735035463793009
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Muon has emerged as an efficient alternative to Adam for pretraining, yet remains underused for fine-tuning. A key obstacle is that most open models are pretrained with Adam, and naively switching to Muon for fine-tuning leads to degraded performance due to an optimizer mismatch. We investigate this mismatch through controlled experiments and relate it to the distinct implicit biases of Adam and Muon. We provide evidence that the mismatch disrupts pretrained knowledge, and that this disruption scales with update strength. This leads us to hypothesize that constraining updates should mitigate the mismatch. We validate this with LoRA: across language and vision tasks, LoRA reduces the performance gap between Adam and Muon observed under full fine-tuning. Studies on LoRA rank, catastrophic forgetting, and LoRA variants further confirm that mismatch severity correlates with update strength. These results shed light on how optimizer mismatch affects fine-tuning and how it can be mitigated. Our code is available at https://github.com/XingyuQu/muon-finetune.
Abstract（参考訳）: MuonはAdamの事前訓練に効果的な代替品として登場したが、微調整には未熟である。キーとなる障害は、ほとんどのオープンモデルがAdamで事前訓練されていることだ。制御された実験を通してこのミスマッチを調査し、Adam と Muon の明確な暗黙バイアスに関連付ける。我々は、ミスマッチが事前訓練された知識を妨害し、この破壊が更新強度とともにスケールする証拠を提供する。これにより、制約のある更新がミスマッチを軽減するべきだという仮説を立てることができます。 LoRAは、言語やビジョンタスク全体にわたって、完全な微調整の下で観察されるAdamとMuonのパフォーマンスギャップを減らします。 LoRAのランク、破滅的な忘れ方、LoRAの変異についての研究は、ミスマッチの重症度が更新強度と相関していることをさらに確認している。これらの結果は、オプティマイザのミスマッチが微調整に与える影響と、それを緩和する方法について光を当てた。私たちのコードはhttps://github.com/XingyuQu/muon-finetune.comで利用可能です。

論文の概要: Can Muon Fine-tune Adam-Pretrained Models?

関連論文リスト