Fugu-MT 論文翻訳(概要): LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport

論文の概要: LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport

arxiv url: http://arxiv.org/abs/2509.23436v1
Date: Sat, 27 Sep 2025 18:11:09 GMT
ステータス: 翻訳完了
システム内更新日: 2025-09-30 22:32:19.225769
Title: LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport
Title（参考訳）: LOTFormer:低ランク最適輸送による二重確率線形アテンション
Authors: Ashkan Shahbazi, Chayne Thrash, Yikun Bai, Keaton Hamm, Navid NaderiAlizadeh, Soheil Kolouri,
Abstract要約: 線形時間と二重確率を同時に行う原理的注意機構を提案する。 LotFormerはLong Range Arenaベンチマークで最先端の結果を達成する。
参考スコア（独自算出の注目度）: 21.50165411149415
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Transformers have proven highly effective across a wide range of modalities. However, the quadratic complexity of the standard softmax attention mechanism poses a fundamental barrier to scaling them to long context windows. A large body of work addresses this with linear attention, which reformulates attention as a kernel function and approximates it with finite feature maps to achieve linear-time computation. Orthogonal to computational scaling, most attention mechanisms -- both quadratic and linear -- produce row-normalized maps that can over-focus on a few tokens, degrading robustness and information flow. Enforcing doubly-stochastic attention alleviates this by balancing token participation across rows and columns, but existing doubly-stochastic attention mechanisms typically introduce substantial overhead, undermining scalability. We propose LOTFormer, a principled attention mechanism that is simultaneously linear-time and doubly-stochastic. Our approach exploits the connection between attention maps and transportation plans between query and key measures. The central idea is to constrain the transport plan to be low-rank by conditioning it on a learnable pivot measure with small support. Concretely, we solve two entropic optimal transport problems (queries $\to$ pivot and pivot $\to$ keys) and compose them into a conditional (glued) coupling. This yields an attention matrix that is provably doubly-stochastic, has rank at most $r \ll n$, and applies to values in $O(nr)$ time without forming the full $n \times n$ map. The pivot locations and masses are learned end-to-end. Empirically, LOTFormer achieves state-of-the-art results on the Long Range Arena benchmark, surpassing prior linear and transport-based attention methods in both accuracy and efficiency.
Abstract（参考訳）: トランスフォーマーは幅広いモダリティで非常に効果的であることが証明されている。しかし、標準的なソフトマックスアテンション機構の二次的な複雑さは、それらを長期のコンテキストウィンドウに拡張するための基本的な障壁となる。これはカーネル関数として注意を再構築し、有限特徴写像で近似して線形時間計算を実現する。計算スケーリングと直交する、最も注目されるメカニズム - 二次的かつ線形 -- は、いくつかのトークンにオーバーフォーカス可能な行正規化マップを生成し、堅牢性と情報フローを劣化させる。二重確率的注意を強制することは、行と列間のトークン参加のバランスをとることでこれを緩和するが、既存の二重確率的注意機構は通常、かなりのオーバーヘッドをもたらし、スケーラビリティを損なう。線形時間と二重確率を同時に行う原理的注意機構であるLOTFormerを提案する。提案手法は,アテンションマップと問合せと重要措置の間の交通計画の関連性を利用したものである。中心となる考え方は、少ない支援で学習可能なピボット尺度で条件付けすることで、輸送計画の低ランク化を制約することである。具体的には、2つのエントロピー的最適輸送問題を解く($\to$ pivot と pivot $\to$ key を問う)。これにより、明らかに2倍確率的であり、最高位が$r \ll n$であり、フル$n \times n$mapを形成することなく$O(nr)$timeの値に適用できる注意行列が得られる。ピボットの位置と質量はエンドツーエンドで学習される。実験的に、LOTFormerはLong Range Arenaベンチマークで最先端の結果を達成し、従来の線形およびトランスポートベースの注意法を精度と効率の両方で上回っている。

論文の概要: LOTFormer: Doubly-Stochastic Linear Attention via Low-Rank Optimal Transport

関連論文リスト