Fugu-MT 論文翻訳(概要): Screening Is Enough

論文の概要: Screening Is Enough

arxiv url: http://arxiv.org/abs/2604.01178v2
Date: Mon, 06 Apr 2026 16:58:57 GMT
ステータス: 翻訳完了
システム内更新日: 2026-04-07 15:49:18.460166
Title: Screening Is Enough
Title（参考訳）: スクリーニングは十分です
Authors: Ken M. Nakanishi,
Abstract要約: 標準ソフトマックスの注意のコアとなる制限は、絶対的なクエリー関連性の概念を定義していないことである。マルチスクリーン(Multiscreen)は、私たちがスクリーニングと呼ぶメカニズムを中心に構築された言語モデルアーキテクチャである。すべてのキーに注意を向ける代わりに、スクリーニングは各キーを明示的なしきい値に対して評価し、無関係なキーを破棄し、残りのキーを集約する。
参考スコア（独自算出の注目度）: 0.5076419064097734
License: http://creativecommons.org/licenses/by/4.0/
Abstract: A core limitation of standard softmax attention is that it does not define a notion of absolute query--key relevance: attention weights are obtained by redistributing a fixed unit mass across all keys according to their relative scores. As a result, relevance is defined only relative to competing keys, and irrelevant keys cannot be explicitly rejected. We introduce Multiscreen, a language-model architecture built around a mechanism we call screening, which enables absolute query--key relevance. Instead of redistributing attention across all keys, screening evaluates each key against an explicit threshold, discarding irrelevant keys and aggregating the remaining keys, thereby removing global competition among keys. Across experiments, Multiscreen achieves comparable validation loss with approximately 40% fewer parameters than a Transformer baseline and enables stable optimization at substantially larger learning rates. It maintains strong performance in long-context perplexity and shows little to no degradation in retrieval performance well beyond the training context length. Notably, even at the training context length, a Multiscreen model with approximately 92% fewer parameters consistently outperforms a larger Transformer in retrieval accuracy. Finally, Multiscreen reduces inference latency by up to 3.2$\times$ at 100K context length.
Abstract（参考訳）: 標準的なソフトマックス・アテンションの中核的な制限は、絶対的なクエリーキー関連性の概念を定義していないことである: 注意重みは、すべてのキーに対して相対スコアに従って固定単位質量を再分配することによって得られる。その結果、関連性は競合するキーに対してのみ定義され、無関係なキーは明示的に拒否できない。我々はMultiscreenを紹介した。Multiscreenは、私たちがスクリーニングと呼ぶメカニズムに基づいて構築された言語モデルアーキテクチャで、絶対的なクエリキーの関連性を可能にする。すべてのキーに対する注意を再分配する代わりに、スクリーニングは各キーを明示的なしきい値に対して評価し、無関係なキーを破棄し、残りのキーを集約することで、キー間のグローバルな競合を取り除く。実験全体で、MultiscreenはTransformerベースラインよりも約40%少ないパラメータで同等の検証損失を達成し、かなり大きな学習速度で安定した最適化を可能にする。長文パープレキシティの強い性能を維持し、トレーニングコンテキスト長を超えて、検索性能の劣化をほとんど、あるいは全く示さない。特に、トレーニングコンテキスト長であっても、パラメータが約92%少ないマルチスクリーンモデルは、より大きいTransformerよりも精度が高い。最後に、Multiscreenは100Kのコンテキスト長で最大3.2$\times$まで推論遅延を削減する。

論文の概要: Screening Is Enough

関連論文リスト