Fugu-MT 論文翻訳(概要): Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few

論文の概要: Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few

arxiv url: http://arxiv.org/abs/2509.16875v3
Date: Wed, 05 Nov 2025 10:49:52 GMT
ステータス: 翻訳完了
システム内更新日: 2025-11-06 18:19:32.13419
Title: Towards Interpretable and Efficient Attention: Compressing All by Contracting a Few
Title（参考訳）: 解釈可能かつ効率のよい注意に向けて--ほんの少しの契約で全てを圧縮する
Authors: Qishuai Wen, Zhiyuan Huang, Chun-Guang Li,
Abstract要約: 本稿では,アルゴリズムのアンローリングにより,本質的に解釈可能かつ効率的な注意機構を導出する統一最適化手法を提案する。我々の研究は、解釈可能性と効率の統合、および注意機構の統一的な公式に光を当てています。
参考スコア（独自算出の注目度）: 9.017839019220817
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Attention mechanisms have achieved significant empirical success in multiple fields, but their underlying optimization objectives remain unclear yet. Moreover, the quadratic complexity of self-attention has become increasingly prohibitive. Although interpretability and efficiency are two mutually reinforcing pursuits, prior work typically investigates them separately. In this paper, we propose a unified optimization objective that derives inherently interpretable and efficient attention mechanisms through algorithm unrolling. Precisely, we construct a gradient step of the proposed objective with a set of forward-pass operations of our \emph{Contract-and-Broadcast Self-Attention} (CBSA), which compresses input tokens towards low-dimensional structures by contracting a few representatives of them. This novel mechanism can not only scale linearly by fixing the number of representatives, but also covers the instantiations of varied attention mechanisms when using different sets of representatives. We conduct extensive experiments to demonstrate comparable performance and superior advantages over black-box attention mechanisms on visual tasks. Our work sheds light on the integration of interpretability and efficiency, as well as the unified formula of attention mechanisms.
Abstract（参考訳）: 注意機構は、複数の分野において顕著な経験的成功を達成しているが、その基礎となる最適化の目的はまだ不明である。さらに、自己注意の二次的な複雑さはますます禁じられている。解釈可能性と効率性は相互に強化する2つの追求であるが、先行研究は通常それらを別々に調査する。本稿では,アルゴリズムをアンロールすることで,本質的に解釈可能かつ効率的な注意機構を導出する統一最適化手法を提案する。正確には、提案対象の勾配ステップを、いくつかの代表者を収縮させることで、低次元構造に対する入力トークンを圧縮する、emph{Contract-and-Broadcast Self-Attention} (CBSA) の前方通過操作セットを用いて構築する。この機構は,代表者の数を固定することで線形に拡張できるだけでなく,異なる代表集合を用いる場合の様々な注意機構のインスタンス化もカバーできる。視覚タスクにおけるブラックボックスのアテンション機構よりも優れた性能と優れたアドバンテージを示すために、広範な実験を行う。我々の研究は、解釈可能性と効率の統合、および注意機構の統一的な公式に光を当てています。

関連論文リスト

Transformers Learn Faster with Semantic Focus [57.97235825738412]
学習性と一般化の観点からスパース変圧器について検討する。入力依存のスパースアテンションモデルは、標準アテンションモデルよりも早く収束し、より一般化しているように見える。
論文参考訳（メタデータ） (2025-06-17T01:19:28Z)
ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans [13.695885742446027]
自己注意は、トレーニング中にいくつかのトークンを過度に集中させ、その結果、準最適情報フローをもたらす可能性がある。我々は,スライスされた最適輸送に基づく,新しい並列化可能な二重確率的アテンション機構を提案する。本手法は, 繰り返しシンクホーン正規化を伴わずに二重性を適用し, 効率を著しく向上させる。
論文参考訳（メタデータ） (2025-02-11T21:20:48Z)
FAST: Factorizable Attention for Speeding up Transformers [1.3637227185793512]
本稿では,スペーシフィケーションを伴わずに,注目行列の完全な表現を維持する線形スケールアテンション機構を提案する。その結果、我々の注意機構は堅牢な性能を示し、自己注意が使用される多様なアプリケーションに対して大きな可能性を秘めていることが示唆された。
論文参考訳（メタデータ） (2024-02-12T18:59:39Z)
Towards Robust Semantic Segmentation against Patch-based Attack via Attention Refinement [68.31147013783387]
我々は,アテンション機構がパッチベースの敵攻撃に弱いことを観察した。本稿では,意味的セグメンテーションモデルの堅牢性を改善するために,ロバスト注意機構(RAM)を提案する。
論文参考訳（メタデータ） (2024-01-03T13:58:35Z)
Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability [23.369773251447636]
DNNの逆例(AE)は転送可能であることが示されている。本稿では,敵対的伝達可能性の理解に向けてさらなる一歩を踏み出す。
論文参考訳（メタデータ） (2023-07-15T19:20:49Z)
Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning [114.36124979578896]
オフライン強化学習アルゴリズムを用いて動的メカニズムを設計する。我々のアルゴリズムは悲観主義の原理に基づいており、オフラインデータセットのカバレッジについて軽度な仮定しか必要としない。
論文参考訳（メタデータ） (2022-05-05T05:44:26Z)
Attention that does not Explain Away [54.42960937271612]
Transformerアーキテクチャに基づくモデルは、大規模なタスクに対して競合するアーキテクチャに基づくモデルよりも精度が高い。 Transformerのユニークな特徴は、任意の距離で自由な情報の流れを可能にする自己認識機構の普遍的な応用である。本稿では,実装が簡単で,"説明的回避"効果を避けるための理論的保証を提供する,二重正規化アテンション方式を提案する。
論文参考訳（メタデータ） (2020-09-29T21:05:39Z)
Self-Attention Attribution: Interpreting Information Interactions Inside Transformer [89.21584915290319]
本稿では,トランスフォーマー内の情報相互作用を解釈する自己帰属属性法を提案する。本研究は,BERT に対する非目標攻撃の実装において,その属性を敵対パターンとして用いることができることを示す。
論文参考訳（メタデータ） (2020-04-23T14:58:22Z)
Is Attention All What You Need? -- An Empirical Investigation on Convolution-Based Active Memory and Self-Attention [7.967230034960396]
各種能動記憶機構がトランスフォーマーの自己注意に取って代わるかどうかを評価する。実験の結果、アクティブメモリだけで言語モデリングの自己認識機構に匹敵する結果が得られることが示唆された。特定のアルゴリズムタスクでは、アクティブメモリメカニズムだけで、自己注意とこれら2つの組み合わせよりも優れています。
論文参考訳（メタデータ） (2019-12-27T02:01:13Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。