Fugu-MT 論文翻訳(概要): HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention

論文の概要: HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention

arxiv url: http://arxiv.org/abs/2603.28458v1
Date: Mon, 30 Mar 2026 13:59:51 GMT
ステータス: 翻訳完了
システム内更新日: 2026-03-31 23:18:45.430649
Title: HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention
Title（参考訳）: HISA:細粒度スパース注意のための効率的な階層的索引付け
Authors: Yufei Xu, Fanxu Meng, Fan Jiang, Yuxuan Wang, Ruijie Zhou, Jiexi Wu, Zhixin Pan, Zhaohui Wang, Xiaojuan Tang, Wenjie Pei, Tongxuan Liu, Di yin, Xing Sun, Muhan Zhang,
Abstract要約: HISA (Hierarchical Indexed Sparse Attention) はインデクサのドロップイン置換である。 HISAは、探索プロセスを平らなトークンスキャンから2段階の階層的な手順に変換する。カーネルレベルのベンチマークでは、HISAは32Kコンテキスト長で2$times$、128Kで4$times$を達成している。
参考スコア（独自算出の注目度）: 62.79085204939384
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Token-level sparse attention mechanisms, exemplified by DeepSeek Sparse Attention (DSA), achieve fine-grained key selection by scoring every historical token for each query using a lightweight indexer, and then computing attention only over the selected subset. While the downstream sparse attention scales efficiently, the indexer still scans the entire prefix for every query, introducing an O($L^2$) per-layer bottleneck that becomes prohibitive as context length grows. We propose HISA (Hierarchical Indexed Sparse Attention), a drop-in replacement for the indexer that transforms the search process from a flat token scan into a two-stage hierarchical procedure. First, a block-level coarse filter scores pooled block representatives to prune irrelevant regions. Then, a token-level refinement applies the original indexer only within the remaining candidate blocks. HISA preserves the exact token-level top-k sparsity pattern required by the downstream Sparse MLA operator and requires no additional training. On kernel-level benchmarks, HISA achieves a 2$\times$ speedup at 32K context length and 4$\times$ at 128K. On Needle-in-a-Haystack and LongBench, we directly replace the indexer in DeepSeek-V3.2 with HISA, without any fine-tuning. HISA closely matches the original DSA in quality while significantly outperforming block-sparse baselines. Moreover, the token selection sets produced by HISA and the original DSA exhibit a mean IoU greater than 99%, indicating that the efficiency gains come with virtually no impact on selection fidelity.
Abstract（参考訳）: DeepSeek Sparse Attention (DSA)で実証されたトークンレベルのスパースアテンション機構は、軽量インデックス装置を使用してクエリ毎にすべての履歴トークンをスコアし、選択したサブセットに対してのみ注意を演算することで、きめ細かいキー選択を実現する。下流のスパースアテンションは効率よくスケールするが、インデクサはクエリ毎にプレフィックス全体をスキャンし、コンテキスト長が大きくなるにつれて禁止となる層ごとのボトルネック(O($L^2$)を導入する。 HISA (Hierarchical Indexed Sparse Attention) は,平らなトークンスキャンから2段階の階層的手順に変換するインデクサの代替手法である。まず、ブロックレベルの粗いフィルタが、プールされたブロック代表者を無関係領域にプーンする。トークンレベルの改善は、元のインデクサを残りの候補ブロックにのみ適用する。 HISAは、下流Sparse MLA演算子に必要な正確なトークンレベルのトップk空間パターンを保持し、追加のトレーニングを必要としない。カーネルレベルのベンチマークでは、HISAは32Kコンテキスト長で2$\times$、128Kで4$\times$を達成している。 Needle-in-a-HaystackとLongBenchでは、DeepSeek-V3.2のインデクサを直接HISAに置き換えます。 HISA はオリジナルの DSA の品質と密に一致し、ブロックスパースベースラインを著しく上回っている。さらに、HISAとDSAが生成したトークン選択セットは平均IoUが99%以上の値を示し、効率の上昇は選択の忠実性にはほとんど影響しないことを示した。

論文の概要: HISA: Efficient Hierarchical Indexing for Fine-Grained Sparse Attention

関連論文リスト