Fugu-MT 論文翻訳(概要): k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS

論文の概要: k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS

arxiv url: http://arxiv.org/abs/2604.03815v2
Date: Tue, 07 Apr 2026 19:22:13 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-09 14:06:05.002874
Title: k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS
Title（参考訳）: グラフ変換器におけるk-Maximum内積アテンションとGraphGPSの表現力
Authors: Jonas De Schouwer, Haitz Sáez de Ocáriz Borde, Xiaowen Dong,
Abstract要約: グラフ変換器のk-MIPアテンションを導入し、トップk操作によりクエリ毎に最も関連性の高いキーノードを選択する。これにより、線形メモリの複雑さと、すべての注意に比較して最大1桁の実用的なスピードアップが達成される。我々はk-MIP変換器が任意の精度で全アテンション変換器を近似できることを証明した。
参考スコア（独自算出の注目度）: 12.688538382869659
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Graph transformers have shown promise in overcoming limitations of traditional graph neural networks, such as oversquashing and difficulties in modeling long-range dependencies. However, their application to large-scale graphs is hindered by the quadratic memory and computational complexity of the all-to-all attention mechanism. Although alternatives such as linearized attention and restricted attention patterns have been proposed, these often degrade performance or limit expressive power. To better balance efficiency and effectiveness, we introduce k-Maximum Inner Product (k-MIP) attention for graph transformers. k-MIP attention selects the most relevant key nodes per query via a top-k operation, yielding a sparse yet flexible attention pattern. Combined with an attention score computation based on symbolic matrices, this results in linear memory complexity and practical speedups of up to an order of magnitude compared to all-to-all attention, enabling the processing of graphs with over 500k nodes on a single A100 GPU. We provide a theoretical analysis of expressive power, showing that k-MIP attention does not compromise the expressiveness of graph transformers: specifically, we prove that k-MIP transformers can approximate any full-attention transformer to arbitrary precision. In addition, we analyze the expressive power of the GraphGPS framework, in which we integrate our attention mechanism, and establish an upper bound on its graph distinguishing capability in terms of the S-SEG-WL test. Finally, we validate our approach on the Long Range Graph Benchmark, the City-Networks benchmark, and two custom large-scale inductive point cloud datasets, consistently ranking among the top-performing scalable graph transformers.
Abstract（参考訳）: グラフトランスフォーマーは、オーバーカッシングや長距離依存関係のモデリングの難しさなど、従来のグラフニューラルネットワークの制限を克服する可能性を示している。しかし、大規模グラフへのそれらの適用は、オール・ツー・オールの注意機構の二次記憶と計算の複雑さによって妨げられている。線形化アテンションや制限されたアテンションパターンなどの代替案が提案されているが、それらはしばしば性能を低下させるか、表現力を制限する。グラフ変換器におけるk-maximum Inner Product (k-MIP) の高効率化と効率向上を図る。 k-MIPアテンションは、トップk操作を介してクエリ毎に最も関連性の高いキーノードを選択し、スパースでフレキシブルなアテンションパターンを生成する。シンボル行列に基づくアテンションスコア計算と組み合わせることで、線形メモリの複雑さと、A100 GPU上で500k以上のノードを持つグラフの処理を可能にする全アテンションに比べて、最大1桁の実用的なスピードアップを実現している。我々は,k-MIP の注意がグラフ変換器の表現性を損なうものではないこと,具体的には,k-MIP 変換器が任意の完全アテンション変換器を任意の精度で近似できることを示す。さらに、注意機構を統合するグラフGPSフレームワークの表現力を分析し、S-SEG-WLテストによるグラフ識別能力の上限を確立する。最後に、Long Range Graph Benchmark、City-Networksベンチマーク、および2つのカスタムな大規模インダクティブポイントクラウドデータセットに対するアプローチを検証する。

論文の概要: k-Maximum Inner Product Attention for Graph Transformers and the Expressive Power of GraphGPS

関連論文リスト