Fugu-MT 論文翻訳(概要): DAPE V2: Process Attention Score as Feature Map for Length Extrapolation

論文の概要: DAPE V2: Process Attention Score as Feature Map for Length Extrapolation

arxiv url: http://arxiv.org/abs/2410.04798v3
Date: Thu, 10 Oct 2024 06:31:26 GMT
ステータス: 翻訳完了
システム内更新日: 2024-11-02 01:58:00.988815
Title: DAPE V2: Process Attention Score as Feature Map for Length Extrapolation
Title（参考訳）: DAPE V2:長さ外挿の特徴マップとしてのプロセス注意スコア
Authors: Chuanyang Zheng, Yihang Gao, Han Shi, Jing Xiong, Jiankai Sun, Jingyao Li, Minbin Huang, Xiaozhe Ren, Michael Ng, Xin Jiang, Zhenguo Li, Yu Li,
Abstract要約: 我々は特徴写像としての注意を概念化し、コンピュータビジョンにおける処理方法を模倣するために畳み込み演算子を適用した。様々な注意関係のモデルに適応できる新しい洞察は、現在のTransformerアーキテクチャがさらなる進化の可能性があることを示している。
参考スコア（独自算出の注目度）: 63.87956583202729
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The attention mechanism is a fundamental component of the Transformer model, contributing to interactions among distinct tokens, in contrast to earlier feed-forward neural networks. In general, the attention scores are determined simply by the key-query products. However, this work's occasional trial (combining DAPE and NoPE) of including additional MLPs on attention scores without position encoding indicates that the classical key-query multiplication may limit the performance of Transformers. In this work, we conceptualize attention as a feature map and apply the convolution operator (for neighboring attention scores across different heads) to mimic the processing methods in computer vision. Specifically, the main contribution of this paper is identifying and interpreting the Transformer length extrapolation problem as a result of the limited expressiveness of the naive query and key dot product, and we successfully translate the length extrapolation issue into a well-understood feature map processing problem. The novel insight, which can be adapted to various attention-related models, reveals that the current Transformer architecture has the potential for further evolution. Extensive experiments demonstrate that treating attention as a feature map and applying convolution as a processing method significantly enhances Transformer performance.
Abstract（参考訳）: 注意機構はトランスフォーマーモデルの基本的な構成要素であり、以前のフィードフォワードニューラルネットワークとは対照的に、異なるトークン間の相互作用に寄与する。一般に、注目スコアは単にキークエリ製品によって決定される。しかし、この作品の時折の試行(DAPEとNoPEを組み合わせた)では、位置エンコーディングなしでアテンションスコアにMLPを追加することで、古典的なキークエリ乗算がトランスフォーマーの性能を制限する可能性があることを示している。本研究では,特徴写像としての注意を概念化し,コンピュータビジョンにおける処理手法を模倣するために,畳み込み演算子(異なる頭部にまたがる注意点)を適用した。具体的には,有意なクエリとキードット積の限られた表現性の結果,トランスフォーマー長外挿問題を特定し,解釈し,その長さ外挿問題をよく理解された特徴写像処理問題に変換することに成功している。様々な注意関係のモデルに適応できる新しい洞察は、現在のTransformerアーキテクチャがさらなる進化の可能性があることを示している。集中的な実験により、注目を特徴マップとして扱い、コンボリューションを処理方法として適用することで、トランスフォーマーの性能が著しく向上することが示された。

関連論文リスト

Attention Retrieves, MLP Memorizes: Disentangling Trainable Components in the Transformer [19.36946128510059]
Transformerアーキテクチャは、現代の大規模言語モデルの成功の中心である。 Transformerのコアコンポーネントは自己アテンションメカニズムですが、パフォーマンス向上のどの面、どの面がそれに起因するのか疑問に思っています。
論文参考訳（メタデータ） (2025-06-01T18:42:39Z)
Exploring Kernel Transformations for Implicit Neural Representations [57.2225355625268]
入射神経表現(INR)は、ニューラルネットワークを利用して、対応する属性に座標をマッピングすることで、信号を表現する。この研究は、モデル自体を変更せずに入出力のカーネル変換の効果を探求する先駆者となった。我々の発見の副産物は、スケールとシフトを組み合わせて、INRを無視できないオーバーヘッドで著しく向上させる、単純で効果的な方法である。
論文参考訳（メタデータ） (2025-04-07T04:43:50Z)
FAST: Factorizable Attention for Speeding up Transformers [1.3637227185793512]
本稿では,スペーシフィケーションを伴わずに,注目行列の完全な表現を維持する線形スケールアテンション機構を提案する。その結果、我々の注意機構は堅牢な性能を示し、自己注意が使用される多様なアプリケーションに対して大きな可能性を秘めていることが示唆された。
論文参考訳（メタデータ） (2024-02-12T18:59:39Z)
DAT++: Spatially Dynamic Vision Transformer with Deformable Attention [87.41016963608067]
Deformable Attention Transformer (DAT++)を提案する。 DAT++は、85.9%のImageNet精度、54.5および47.0のMS-COCOインスタンスセグメンテーションmAP、51.5のADE20KセマンティックセグメンテーションmIoUで、様々なビジュアル認識ベンチマークで最先端の結果を達成している。
論文参考訳（メタデータ） (2023-09-04T08:26:47Z)
FLatten Transformer: Vision Transformer using Focused Linear Attention [80.61335173752146]
線形注意(linear attention)は、その線形複雑性に対して、はるかに効率的な代替手段を提供する。現在の線形アテンションアプローチは、大きなパフォーマンス劣化に悩まされるか、追加の計算オーバーヘッドを導入するかのいずれかである。本研究では,高効率と表現性の両方を実現するために,新しいFocused Linear Attentionモジュールを提案する。
論文参考訳（メタデータ） (2023-08-01T10:37:12Z)
Slide-Transformer: Hierarchical Vision Transformer with Local Self-Attention [34.26177289099421]
視覚変換器(ViT)の最近の進歩において、自己注意機構が重要な要素となっている。本稿では,高効率,柔軟性,一般化性を実現するために共通畳み込み演算を利用する新しいローカルアテンションモジュールを提案する。我々のモジュールは、効率的かつ柔軟な方法で局所的な注意パラダイムを実現する。
論文参考訳（メタデータ） (2023-04-09T13:37:59Z)
Rethinking Query-Key Pairwise Interactions in Vision Transformers [5.141895475956681]
本稿では,問合せキーの対の相互作用を排除し,注意重みを求めるために計算効率の高い相性ゲートを用いるキーオンリーの注意を提案する。我々は、ImageNet分類ベンチマークのパラメータ限定設定において、最先端の精度に達する新しい自己注意モデルファミリーLinGlosを開発した。
論文参考訳（メタデータ） (2022-07-01T03:36:49Z)
Vicinity Vision Transformer [53.43198716947792]
線形複雑度を有する視覚変換器に局所性バイアスを導入するビシニティ注意法を提案する。提案手法は,従来の手法よりも50%少ないパラメータで,最先端の画像分類精度を実現する。
論文参考訳（メタデータ） (2022-06-21T17:33:53Z)
Transformers Solve the Limited Receptive Field for Monocular Depth Prediction [82.90445525977904]
畳み込みニューラルネットワークとトランスの両方の恩恵を受けるアーキテクチャであるTransDepthを提案します。連続ラベルを含む画素単位での予測問題にトランスフォーマーを適用する最初の論文である。
論文参考訳（メタデータ） (2021-03-22T18:00:13Z)
SparseBERT: Rethinking the Importance Analysis in Self-attention [107.68072039537311]
トランスフォーマーベースのモデルは、その強力な能力のために自然言語処理(NLP)タスクに人気がある。事前学習モデルの注意マップの可視化は,自己着脱機構を理解するための直接的な方法の一つである。本研究では,sparsebert設計の指導にも適用可能な微分可能アテンションマスク(dam)アルゴリズムを提案する。
論文参考訳（メタデータ） (2021-02-25T14:13:44Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。