Fugu-MT 論文翻訳(概要): Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

論文の概要: Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

arxiv url: http://arxiv.org/abs/2604.21100v1
Date: Wed, 22 Apr 2026 21:38:25 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-24 14:40:06.188435
Title: Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences
Title（参考訳）: 予備条件付きDeltaNet:線形再帰に対する曲率対応シーケンスモデリング
Authors: Neehal Tumma, Noel Loo, Daniela Rus,
Abstract要約: 我々はDeltaNet,GDN,KDAのプレコンディション付き変種を,効率的なチャンクワイズ並列アルゴリズムとともに導入する。予備条件付きデルタルールの繰り返しは,340M,1Bスケールでの合成リコールベンチマークと言語モデリングにおいて一貫した性能向上をもたらす。
参考スコア（独自算出の注目度）: 51.38664601405696
License: http://creativecommons.org/licenses/by/4.0/
Abstract: To address the increasing long-context compute limitations of softmax attention, several subquadratic recurrent operators have been developed. This work includes models such as Mamba-2, DeltaNet, Gated DeltaNet (GDN), and Kimi Delta Attention (KDA). As the space of recurrences grows, a parallel line of work has arisen to taxonomize them. One compelling view is the test-time regression (TTR) framework, which interprets recurrences as performing online least squares updates that learn a linear map from the keys to values. Existing delta-rule recurrences can be seen as first-order approximations to this objective, but notably ignore the curvature of the least-squares loss during optimization. In this work, we address this by introducing preconditioning to these recurrences. Starting from the theory of online least squares, we derive equivalences between linear attention and the delta rule in the exactly preconditioned case. Next, we realize this theory in practice by proposing a diagonal approximation: this enables us to introduce preconditioned variants of DeltaNet, GDN, and KDA alongside efficient chunkwise parallel algorithms for computing them. Empirically, we find that our preconditioned delta-rule recurrences yield consistent performance improvements across synthetic recall benchmarks and language modeling at the 340M and 1B scale.
Abstract（参考訳）: ソフトマックスアテンションの長文計算の制限に対処するため、いくつかのサブクワッドラティック・リカレント演算子を開発した。この作業には、Mamba-2、DeltaNet、Gated DeltaNet (GDN)、Kim Delta Attention (KDA)といったモデルが含まれる。再発の空間が大きくなるにつれて、それらを分類するために平行な作業線が生まれている。このフレームワークは、繰り返しをオンラインの最小二乗更新の実行と解釈し、キーから値への線形マップを学習する。既存のデルタルールの再発は、この目的に対する一階近似と見なすことができるが、特に最適化中の最小二乗損失の曲率を無視する。本稿では,これらの再発に対してプレコンディショニングを導入することで,この問題に対処する。オンライン最小二乗の理論から始めると、線形注意と正確に条件付きの場合のデルタ則の等価性を導出する。次に, 対角近似を提案することにより, この理論を実際に実現し, デルタネット, GDN, KDA の事前条件付き変種を, 計算に効率的なチャンクワイズ並列アルゴリズムとともに導入する。実験により, 予備条件付きデルタルル再帰は, 340M と 1B スケールでの合成リコールベンチマークと言語モデリングにおいて一貫した性能向上をもたらすことがわかった。

論文の概要: Preconditioned DeltaNet: Curvature-aware Sequence Modeling for Linear Recurrences

関連論文リスト