Fugu-MT 論文翻訳(概要): Higher-order Linear Attention

論文の概要: Higher-order Linear Attention

arxiv url: http://arxiv.org/abs/2510.27258v1
Date: Fri, 31 Oct 2025 07:54:37 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-03 17:52:16.02809
Title: Higher-order Linear Attention
Title（参考訳）: 高次線形注意
Authors: Yifan Zhang, Zhen Qin, Quanquan Gu,
Abstract要約: スケールされたドット積の注意の二次コストは、自己回帰言語モデルを長いコンテキストにスケールするための中心的な障害である。本稿では,高次線形注意(Higher-order Linear Attention, HLA)を提案する。
参考スコア（独自算出の注目度）: 59.92962330635185
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The quadratic cost of scaled dot-product attention is a central obstacle to scaling autoregressive language models to long contexts. Linear-time attention and State Space Models (SSMs) provide scalable alternatives but are typically restricted to first-order or kernel-based approximations, which can limit expressivity. We introduce Higher-order Linear Attention (HLA), a causal, streaming mechanism that realizes higher interactions via compact prefix sufficient statistics. In the second-order case, HLA maintains a constant-size state and computes per-token outputs in linear time without materializing any $n \times n$ matrices. We give closed-form streaming identities, a strictly causal masked variant using two additional summaries, and a chunk-parallel training scheme based on associative scans that reproduces the activations of a serial recurrence exactly. We further outline extensions to third and higher orders. Collectively, these results position HLA as a principled, scalable building block that combines attention-like, data-dependent mixing with the efficiency of modern recurrent architectures. Project Page: https://github.com/yifanzhang-pro/HLA.
Abstract（参考訳）: スケールド・ドット・プロダクティヴ・アテンションの二次コストは、自己回帰言語モデルを長いコンテキストにスケールするための中心的な障害である。線形時間アテンションとステートスペースモデル(SSM)はスケーラブルな代替手段を提供するが、通常は1次またはカーネルベースの近似に制限されており、表現性を制限することができる。本稿では,高次線形注意(Higher-order Linear Attention, HLA)を提案する。 2階の場合、HLAは一定サイズの状態を維持し、$n \times n$行列を具体化せずに、線形時間でトーケン毎の出力を計算する。 2つの追加サマリーを用いた厳密な因果マスク付き変種と、連続反復のアクティベーションを正確に再現する連想スキャンに基づくチャンク並列トレーニングスキームを提供する。さらに、三階と高階への拡張を概説する。まとめると、これらの結果はHLAを、注意のようなデータ依存の混合と現代的な再帰アーキテクチャの効率を結合した、原則化されたスケーラブルなビルディングブロックとして位置づけている。プロジェクトページ: https://github.com/yifanzhang-pro/HLA

論文の概要: Higher-order Linear Attention

関連論文リスト