Fugu-MT 論文翻訳(概要): Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes

論文の概要: Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes

arxiv url: http://arxiv.org/abs/2605.06152v2
Date: Tue, 12 May 2026 10:56:27 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-13 18:21:06.728855
Title: Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes
Title（参考訳）: グルークかグリッチか? スリングショットの低速駆動はいかにスパイクをなくすか
Authors: Liu Hanqing, Jianjun Cao, Yuanze Li, Zijian Zhou,
Abstract要約: ディープニューラルネットワークは、非正規化された長期トレーニング中に周期的な損失スパイクを示す。本稿では,この現象が浮動小数点演算の精度限界の結果であることを示す。我々はこのメカニズムがスリングショットのスパイク前の急激なノルム成長を説明できることを示した。
参考スコア（独自算出の注目度）: 4.886486588387005
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep neural networks exhibit periodic loss spikes during unregularized long-term training, a phenomenon known as the "Slingshot Mechanism." Existing work usually attributes this to intrinsic optimization dynamics, but its triggering mechanism remains unclear. This paper proves that this phenomenon is a result of floating-point arithmetic precision limits. As training enters a high-confidence stage, the difference between the correct-class logit and the other logits may exceed the absorption-error threshold. Then during backpropagation, the gradient of the correct class is rounded exactly to zero, while the gradients of the incorrect classes remain nonzero. This breaks the zero-sum constraint of gradients across classes and introduces a systematic drift in the parameter update of the classifier layer. We prove that this drift forms a positive feedback loop with the feature, causing the global classifier mean and the global feature mean to grow exponentially. We call this mechanism Numerical Feature Inflation (NFI). This mechanism explains the rapid norm growth before a Slingshot spike, the subsequent reappearance of gradients, and the resulting loss spike. We further show that NFI is not equivalent to an observed loss spike: in more practical tasks, partial absorption may not produce visible spikes, but it can still break the zero-sum constraint and drive rapid growth of parameter norms. Our results reinterpret Slingshot as a numerical dynamic of finite-precision training, and provide a testable explanation for abnormal parameter growth and logit divergence in late-stage training.
Abstract（参考訳）: ディープニューラルネットワークは、非正規化された長期トレーニング中に周期的な損失スパイクを示し、これは「スリングショット機構」として知られる現象である。既存の作業は通常、これを本質的な最適化力学に帰着するが、そのトリガー機構はいまだ不明である。本稿では,この現象が浮動小数点演算の精度限界の結果であることを示す。トレーニングが高信頼の段階に入ると、正しいクラスロジットと他のロジットの違いが吸収エラー閾値を超える可能性がある。そして、バックプロパゲーションの間、正しいクラスの勾配は正確に0に丸められ、不正確なクラスの勾配は 0 に留まる。これはクラス間の勾配のゼロサム制約を破り、分類器層のパラメータ更新に体系的なドリフトを導入する。このドリフトが特徴と正のフィードバックループを形成し、グローバルな分類器の平均とグローバルな特徴が指数関数的に増加することを証明した。我々はこのメカニズムを数値的特徴インフレーション (NFI) と呼ぶ。このメカニズムは、スリングショットのスパイク前の急激なノルム成長、その後の勾配の再出現、そして結果として生じる損失スパイクを説明する。さらに、NFIは観測された損失スパイクと等価ではなく、より実践的なタスクでは部分吸収は可視スパイクを生じないが、それでもゼロサム制約を破り、パラメータノルムの急速な成長を引き起こす可能性がある。本研究では,Slingshotを有限精度トレーニングの数値力学として再解釈し,後期訓練におけるパラメータの異常成長とロジットのばらつきを検証可能な説明を与える。

論文の概要: Grokking or Glitching? How Low-Precision Drives Slingshot Loss Spikes

関連論文リスト