Fugu-MT 論文翻訳(概要): An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants

論文の概要: An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants

arxiv url: http://arxiv.org/abs/2510.09827v1
Date: Fri, 10 Oct 2025 19:57:49 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:29.64077
Title: An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants
Title（参考訳）: 非ユークリッドグラディエント蛍光の探索:ミューオンとその多変量
Authors: Michael Crawshaw, Chirag Modi, Mingrui Liu, Robert M. Gower,
Abstract要約: MuonMax は学習速度の選択に敏感であるのに対して,私たちが MuonMax と呼ぶ新しい変種は,はるかに堅牢である。我々は、勾配非ユークリッド法とモデルベース運動量(モモとして知られる)を組み合わせる方法を示す。
参考スコア（独自算出の注目度）: 38.56190531594778
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: To define a steepest descent method over a neural network, we need to choose a norm for each layer, a way to aggregate these norms across layers, and whether to use normalization. We systematically explore different alternatives for aggregating norms across layers, both formalizing existing combinations of Adam and the recently proposed Muon as a type of non-Euclidean gradient descent, and deriving new variants of the Muon optimizer. Through a comprehensive experimental evaluation of the optimizers within our framework, we find that Muon is sensitive to the choice of learning rate, whereas a new variant we call MuonMax is significantly more robust. We then show how to combine any non-Euclidean gradient method with model based momentum (known as Momo). The new Momo variants of Muon are significantly more robust to hyperparameter tuning, and often achieve a better validation score. Thus for new tasks, where the optimal hyperparameters are not known, we advocate for using Momo in combination with MuonMax to save on costly hyperparameter tuning.
Abstract（参考訳）: ニューラルネットワーク上で最も急降下法を定義するには、各レイヤのノルム、層間でこれらのノルムを集約する方法、正規化を使用するかどうかを選択する必要がある。我々は、Adam と最近提案された Muon の既存の組み合わせを非ユークリッド勾配勾配の型として定式化し、新しい Muon 最適化器の変種を導出するなど、層間のノルムを集約するための異なる選択肢を体系的に検討する。フレームワーク内のオプティマイザを包括的に実験的に評価した結果,Muonは学習速度の選択に敏感であることがわかった。次に、任意の非ユークリッド勾配法とモデルベース運動量(モモとして知られる)を組み合わせる方法を示す。 Muon の新しい Momo 変種はハイパーパラメータチューニングに対してはるかに堅牢であり、しばしばより良い検証スコアを得る。したがって、最適なハイパーパラメータが不明な新しいタスクに対しては、コストのかかるハイパーパラメータチューニングを省くために、MomoとMuonMaxを組み合わせることを推奨する。

論文の概要: An Exploration of Non-Euclidean Gradient Descent: Muon and its Many Variants

関連論文リスト