Fugu-MT 論文翻訳(概要): Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

論文の概要: Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

arxiv url: http://arxiv.org/abs/2601.19895v2
Date: Fri, 30 Jan 2026 03:44:40 GMT
ステータス: 翻訳完了
システム内更新日: 2026-02-02 14:22:45.22131
Title: Post-LayerNorm Is Back: Stable, ExpressivE, and Deep
Title（参考訳）: Post-LayerNormが復活:安定版、ExpressivE、ディープ
Authors: Chen Chen, Lai Wei,
Abstract要約: 大規模言語モデル(LLM)のスケーリングは壁にぶつかっている。拡張モデルはリターンを減少させ、コンテキスト長の延長は基本的な表現性を改善しない。ポストレイヤーノーム (Post-LayerNorm, Post-LN) の定式化について検討した。本稿では,Post-LNの中央障害モードがResNetスタイルの残差経路から生じることを示す。我々は、この残路をハイウェイスタイルの接続で置き換えるポストLN変換器であるKeelを提示する。
参考スコア（独自算出の注目度）: 6.007650558372649
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Large language model (LLM) scaling is hitting a wall. Widening models yields diminishing returns, and extending context length does not improve fundamental expressivity. In contrast, depth scaling offers theoretically superior expressivity, yet current Transformer architectures struggle to train reliably at extreme depths. We revisit the Post-LayerNorm (Post-LN) formulation, whose instability at scale caused its replacement by Pre-LN in modern LLMs. We show that the central failure mode of Post-LN arises from the ResNet-style residual pathway, which introduces gradient vanishing in deep networks. We present Keel, a Post-LN Transformer that replaces this residual path with a Highway-style connection. This modification preserves the gradient flow through the residual branch, preventing signal vanishing from the top layers to the bottom. Unlike prior methods, Keel enables stable training at extreme depths without requiring specialized initialization or complex optimization tricks. Keel trains robustly at depths exceeding 1000 layers and consistently improves perplexity and depth-scaling characteristics over Pre-LN. These findings indicate that Post-LN, when paired with a Highway-style connection, provides a simple and effective foundation for building deeply scalable LLMs, opening the possibility for future infinite-depth architectures.
Abstract（参考訳）: 大規模言語モデル(LLM)のスケーリングが壁にぶつかっています。拡大モデルではリターンが減少し、コンテキスト長が拡張しても基本的な表現性は向上しない。対照的に、深度スケーリングは理論的に優れた表現性を提供するが、現在のトランスフォーマーアーキテクチャは極度に深度で確実に訓練するのに苦労している。ポストレイヤーノーム (Post-LayerNorm, Post-LN) の定式化について検討した。本稿では,Post-LNの中央障害モードがResNetスタイルの残差経路から生じることを示す。我々は、この残路をハイウェイスタイルの接続で置き換えるポストLN変換器であるKeelを提示する。この修正は、残留枝を通る勾配の流れを保ち、上層から下層への信号が消滅するのを防ぐ。従来の手法とは異なり、Keelは特別な初期化や複雑な最適化のトリックを必要とせず、極深度で安定した訓練を可能にする。キール列車は1000層を超える深さで頑丈に走行し、プレLN上でのパープレキシティと深度スケーリング特性を一貫して改善する。これらの結果は、Post-LNがハイウェイスタイルの接続と組み合わせることで、深くスケーラブルなLCMを構築するためのシンプルで効果的な基盤を提供し、将来の無限深度アーキテクチャの可能性を開くことを示唆している。

論文の概要: Post-LayerNorm Is Back: Stable, ExpressivE, and Deep

関連論文リスト