Fugu-MT 論文翻訳(概要): Stability of Transformers under Layer Normalization

論文の概要: Stability of Transformers under Layer Normalization

arxiv url: http://arxiv.org/abs/2510.09904v1
Date: Fri, 10 Oct 2025 22:27:20 GMT
ステータス: 翻訳完了
システム内更新日: 2025-10-14 18:06:29.682925
Title: Stability of Transformers under Layer Normalization
Title（参考訳）: 層正規化下における変圧器の安定性
Authors: Kelvin Kan, Xingjian Li, Benjamin J. Zhang, Tuhin Sahai, Stanley Osher, Krishna Kumar, Markos A. Katsoulakis,
Abstract要約: 異なる層正規化配置下での深部変圧器の安定性について検討した。我々は、訓練されたトランスフォーマーにおける隠れ状態の成長に明確な境界を導出する。我々のフレームワークは、新しいアーキテクチャ修正の下でトランスフォーマーの安定性を正当性チェックする原則的な方法を提供する。
参考スコア（独自算出の注目度）: 7.235320241343618
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Despite their widespread use, training deep Transformers can be unstable. Layer normalization, a standard component, improves training stability, but its placement has often been ad-hoc. In this paper, we conduct a principled study on the forward (hidden states) and backward (gradient) stability of Transformers under different layer normalization placements. Our theory provides key insights into the training dynamics: whether training drives Transformers toward regular solutions or pathological behaviors. For forward stability, we derive explicit bounds on the growth of hidden states in trained Transformers. For backward stability, we analyze how layer normalization affects the backpropagation of gradients, thereby explaining the training dynamics of each layer normalization placement. Our analysis also guides the scaling of residual steps in Transformer blocks, where appropriate choices can further improve stability and performance. Our numerical results corroborate our theoretical findings. Beyond these results, our framework provides a principled way to sanity-check the stability of Transformers under new architectural modifications, offering guidance for future designs.
Abstract（参考訳）: 広く使われているにもかかわらず、ディープトランスフォーマーのトレーニングは不安定である。標準コンポーネントであるレイヤ正規化は、トレーニングの安定性を向上させるが、その配置はしばしばアドホックである。本稿では,異なる層正規化配置下での変圧器の前方(隠蔽状態)と後方(緩やかな)安定性について,原理的な研究を行う。トレーニングがトランスフォーマーを通常のソリューションに向かわせるのか、あるいは病理学的行動に向かわせるのか。前方安定のために、訓練されたトランスフォーマーにおける隠れ状態の成長に明確な境界を導出する。後方安定のために,各層正規化配置のトレーニング力学を説明することによって,層正規化が勾配の後方伝播にどう影響するかを解析する。我々の分析はトランスフォーマーブロックの残留ステップのスケーリングもガイドしており、適切な選択によって安定性と性能がさらに向上する。我々の数値結果は我々の理論的な結果を裏付ける。これらの結果の他に、我々のフレームワークはトランスフォーマーの安定性を新しいアーキテクチャで検証する原則的な方法を提供し、将来の設計のガイダンスを提供する。

論文の概要: Stability of Transformers under Layer Normalization

関連論文リスト