Fugu-MT 論文翻訳(概要): Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?

論文の概要: Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?

arxiv url: http://arxiv.org/abs/2606.23676v1
Date: Mon, 22 Jun 2026 17:58:52 GMT
ステータス: 翻訳完了
システム内更新日: 2026-06-24 17:12:41.691942
Title: Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?
Title（参考訳）: オープニング問題:AdamWは重音下で有効か?
Authors: Dingzhi Yu, Hongyi Tao, Yuanyu Wan, Luo Luo, Lijun Zhang,
Abstract要約: AdamWは、大規模な言語モデルを訓練するデファクト理論である。最近の研究は、ライオンやムオンのような符号に基づく勾配が急激な重み付け率を達成することを示している。 AdamWは、同じ重み付き仮定の下で収束できるのか、それとも、その第二モーメントアキュムレータが真の障害を生み出すのか?
参考スコア（独自算出の注目度）: 43.39716211464324
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: AdamW is the de facto optimizer for training large language models (LLMs), yet the theory behind it still lives mostly in finite-variance regimes. This is increasingly unsatisfying, as empirical evidence indicates that stochastic gradient noise in LLM pretraining is typically heavy-tailed. Recent work shows that sign-based optimizers such as Lion and Muon achieve sharp heavy-tailed rates, and that AdaGrad can also converge under heavy-tailed noise. However, no rigorous convergence theory for AdamW has yet been established in this regime. Can AdamW converge under the same heavy-tailed assumptions, or does its second-moment accumulator create a genuine obstruction? We formulate this as an open problem, prove a positive weighted-metric benchmark, and give a corridor lower-bound mechanism showing how denominator memory can hide large gradients.
Abstract（参考訳）: AdamWは大規模言語モデル(LLM)を訓練するためのデファクトオプティマイザである。 LLM前訓練における確率的勾配ノイズは典型的には重く、これはますます不満足になっている。最近の研究は、LionやMuonのような符号ベースのオプティマイザがシャープなヘビーテールレートを実現し、AdaGradもヘビーテールノイズの下で収束可能であることを示している。しかし、AdamW に対する厳密な収束理論はまだ確立されていない。 AdamWは、同じ重み付き仮定の下で収束できるのか、それとも、その第二モーメントアキュムレータが真の障害を生み出すのか? 我々はこれをオープンな問題として定式化し、正の重み付きベンチマークを証明し、デノミネータメモリが大きな勾配を隠蔽する方法を示す廊下下界機構を与える。

論文の概要: Open Problem: Is AdamW Effective Under Heavy-Tailed Noise?

関連論文リスト