Fugu-MT 論文翻訳(概要): SNLP: Layer-Parallel Inference via Structured Newton Corrections

論文の概要: SNLP: Layer-Parallel Inference via Structured Newton Corrections

arxiv url: http://arxiv.org/abs/2605.17842v2
Date: Wed, 27 May 2026 15:46:55 GMT
ステータス: 翻訳完了
システム内更新日: 2026-05-28 17:38:54.673295
Title: SNLP: Layer-Parallel Inference via Structured Newton Corrections
Title（参考訳）: SNLP:構造ニュートン補正による層並列推論
Authors: Ligong Han, Kai Xu, Hao Wang, Akash Srivastava,
Abstract要約: 本研究では, 非線形残留方程式の解として, 層間の隠れ状態トレースを扱い, 層間依存性を緩和できるかどうかを考察した。構造ニュートン層並列性(SNLP)は、ジャコビアン層をより安価なアーキテクチャによるサロゲートダイナミクスに置き換えるトレーニングと推論のフレームワークである。
参考スコア（独自算出の注目度）: 22.126763421836966
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Autoregressive language models execute Transformer layers sequentially, creating a latency bottleneck that is not removed by conventional tensor or pipeline parallelism. We study whether this layerwise dependency can be relaxed by treating the hidden-state trace across layers as the solution of a nonlinear residual equation and solving it with parallel Newton-style updates. While this view is principled, exact Newton corrections require expensive Jacobian-vector products and naive fixed-point iterations are unstable on trained Transformers. We introduce Structured Newton Layer Parallelism (SNLP), a training and inference framework that replaces exact layer Jacobians with cheap architecture-induced surrogate dynamics. In residual Transformers, this yields Identity Newton (IDN), where the correction reduces to a prefix-sum-like update; in mHC-style architectures, HC Newton (HCN) uses the model's residual mixing matrix. We also study SNLP-aware training, including pretraining regularization and direct SNLP-forward SFT. Experiments on Nanochat-scale Transformers show that SNLP exposes a practical speed-quality frontier: on 0.5B models, it reaches up to 2.58x wall-clock speedup, and a less aggressive configuration reaches 1.40x speedup without increasing PPL. The useful tradeoff comes from the biased finite-iteration computation induced by IDN/HCN rather than exact recovery of the sequential trace. We further show that SNLP-forward SFT can preserve downstream task accuracy, and that SNLP can serve as a drafter for self-speculative decoding while a sequential verifier preserves output correctness.
Abstract（参考訳）: 自動回帰言語モデルはTransformer層を順次実行し、従来のテンソルやパイプラインの並列性によって取り除かれない遅延ボトルネックを生成する。非線形残留方程式の解法として層をまたいだ隠れ状態トレースを処理し, 並列なニュートン方式の更新で解くことにより, この層依存性を緩和できるかどうかを検討した。この考え方は原則であるが、正確なニュートン補正には高価なジャコビアンベクター製品が必要であり、訓練されたトランスフォーマーでは単純な固定点反復が不安定である。構造ニュートン層並列性(SNLP)は、ジャコビアン層をより安価なアーキテクチャによるサロゲートダイナミクスに置き換えるトレーニングと推論のフレームワークである。残留変圧器ではIDN(Identity Newton)となり、補正はプレフィックスサムのような更新に還元され、mHCスタイルのアーキテクチャではHCニュートン(HCN)はモデルの残留混合行列を使用する。また,SNLP-forward SFTの事前訓練を含むSNLP-aware Trainingについても検討した。 SNLPは0.5Bモデルでは、最大2.58倍のウォールクロックスピードアップに達し、PPLを増大させることなく、よりアグレッシブな構成で1.40倍のスピードアップに達する。有用なトレードオフは、シーケンシャルトレースの正確な回復ではなく、IDN/HCNによって誘導されるバイアス付き有限イテレーション計算から生じる。さらに、SNLP-forward SFTは、ダウンストリームタスクの精度を保ち、SNLPは、逐次検証器が出力の正確性を保ちながら、自己投機的デコーディングのドラフトアとして機能することを示す。

論文の概要: SNLP: Layer-Parallel Inference via Structured Newton Corrections

関連論文リスト