Fugu-MT 論文翻訳(概要): PermuteFormer: Efficient Relative Position Encoding for Long Sequences

論文の概要: PermuteFormer: Efficient Relative Position Encoding for Long Sequences

arxiv url: http://arxiv.org/abs/2109.02377v2
Date: Wed, 8 Sep 2021 13:17:49 GMT
ステータス: 翻訳完了
システム内更新日: 2021-09-09 10:26:37.434888
Title: PermuteFormer: Efficient Relative Position Encoding for Long Sequences
Title（参考訳）: PermuteFormer:ロングシーケンスのための効率的な相対位置符号化
Authors: Peng Chen
Abstract要約: 相対位置符号化を用いたPermuteFormerを提案する。 PermuteFormerは、位置情報をエンコードするために、クエリとキーに位置依存変換を適用する。実験の結果,PermuteFormerは計算オーバーヘッドのほとんどないPerformerの性能を均一に向上することがわかった。
参考スコア（独自算出の注目度）: 2.92125254553717
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A recent variation of Transformer, Performer, scales Transformer to longer sequences with a linear attention mechanism. However, it is not compatible with relative position encoding, which has advantages over absolute position encoding. In this paper, we discuss possible ways to add relative position encoding to Performer. Based on the analysis, we propose PermuteFormer, a Performer-based model with relative position encoding that scales linearly on long sequences. PermuteFormer applies position-dependent transformation on queries and keys to encode positional information into the attention module. This transformation is carefully crafted so that the final output of self-attention is not affected by absolute positions of tokens. PermuteFormer introduces negligible computational overhead by design that it runs as fast as Performer. We evaluate PermuteFormer on Long-Range Arena, a dataset for long sequences, as well as WikiText-103, a language modeling dataset. The experiments show that PermuteFormer uniformly improves the performance of Performer with almost no computational overhead and outperforms vanilla Transformer on most of the tasks.
Abstract（参考訳）: Transformerの最近のバリエーションであるPerformerは、線形アテンション機構でTransformerを長いシーケンスにスケールする。しかし、絶対位置符号化よりも有利な相対位置符号化とは互換性がない。本稿では,Performerに相対位置エンコーディングを追加する可能性について論じる。解析に基づいて,長い列に線形にスケールする相対的位置符号化を持つ演奏者に基づくモデルであるpermuteformerを提案する。 permuteformerはクエリとキーに位置依存変換を適用し、位置情報をアテンションモジュールにエンコードする。この変換は、自己アテンションの最終出力がトークンの絶対位置に影響されないよう慎重に作成される。 PermuteFormerはPerformerと同じくらい高速に動作するように設計されている。長いシーケンスのデータセットであるLong-Range ArenaのPermuteFormerと、言語モデリングデータセットであるWikiText-103を評価した。実験の結果、PermuteFormerは計算オーバーヘッドがほとんどなく、Performerのパフォーマンスを均一に改善し、ほとんどのタスクでバニラトランスフォーマーを上回っていることがわかった。

関連論文リスト

PaTH Attention: Position Encoding via Accumulating Householder Transformations [56.32365080761523]
PaTHは、ハウステリア変換の累積積に基づいて、フレキシブルなデータ依存位置符号化方式である。家庭用行列の積をコンパクトに表現することで,効率的な並列学習アルゴリズムを導出する。
論文参考訳（メタデータ） (2025-05-22T08:36:09Z)
Functional Interpolation for Relative Positions Improves Long Context Transformers [86.12843093589]
本稿では,より長いコンテキストに変換器の一般化を改善するために,プログレッシブなFIREを用いた関数的相対的位置符号化を提案する。理論的には、これはT5のRPE、Alibi、Kerpleなどの一般的な相対的な位置エンコーディングのいくつかを表現できる。 FIREモデルは、ゼロショット言語モデリングと長文ベンチマークの両方において、より長い文脈での一般化がより優れていることを示す。
論文参考訳（メタデータ） (2023-10-06T17:59:11Z)
Improving Position Encoding of Transformers for Multivariate Time Series Classification [5.467400475482668]
本稿では,時間絶対位置という時系列データ専用の絶対位置符号化手法を提案する。次に,TAPE/eRPEとConvTranという名前の畳み込み型入力符号化を組み合わせた新しい時系列分類(MTSC)モデルを提案し,時系列データの位置とデータ埋め込みを改善する。
論文参考訳（メタデータ） (2023-05-26T05:30:04Z)
Linearizing Transformer with Key-Value Memory Bank [54.83663647680612]
我々は、ソースシーケンスを低次元表現に投影するアプローチであるMemSizerを提案する。 MemSizerは同じ線形時間複雑性を達成するだけでなく、効率的なリカレントスタイルの自己回帰生成も楽しめる。我々はMemSizerがバニラ変圧器の効率と精度のトレードオフを改善することを実証した。
論文参考訳（メタデータ） (2022-03-23T18:10:18Z)
Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding [63.539333383965726]
相対的位置符号化(RPE)を用いた変換器の注意計算を高速化する新しい手法を提案する。相対的な位置符号化がToeplitz行列を形成するという観測に基づいて、Fast Fourier Transform (FFT) を用いて、RPEによるカーネル化された注意を効率的に計算できることを数学的に示す。
論文参考訳（メタデータ） (2021-06-23T17:51:26Z)
Demystifying the Better Performance of Position Encoding Variants for Transformer [12.503079503907989]
トランスフォーマーモデルに位置とセグメントをエンコードする方法を示します。提案手法は、GLUE, XTREME, WMTベンチマークのSOTAと同等に実行し、コストを節約する。
論文参考訳（メタデータ） (2021-04-18T03:44:57Z)
Nystr\"omformer: A Nystr\"om-Based Algorithm for Approximating Self-Attention [60.043273122786005]
我々は,シーケンス長の関数として優れたスケーラビリティを示すモデルであるNystr"omformerを提案する。 Nystr"omformerのスケーラビリティにより、アプリケーションは数千のトークンで長いシーケンスを実行できる。 GLUEベンチマークで複数のダウンストリームタスクの評価を行い、標準シーケンス長のレビューを行い、我々のNystrオムフォーマが標準トランスフォーマよりも相容れないか、あるいはいくつかのケースで若干改善されていることを確認した。
論文参考訳（メタデータ） (2021-02-07T20:06:59Z)
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing [112.2208052057002]
本稿では,隠れ状態の列を短く圧縮するFunnel-Transformerを提案する。 Funnel-TransformerはFLOPに匹敵する数が少ないため、様々なシーケンスレベルの予測タスクにおいて標準のTransformerよりも優れている。
論文参考訳（メタデータ） (2020-06-05T05:16:23Z)
Relative Positional Encoding for Speech Recognition and Direct Translation [72.64499573561922]
相対位置符号化方式を音声変換器に適用する。その結果,ネットワークは音声データに存在する変動分布に適応できることがわかった。
論文参考訳（メタデータ） (2020-05-20T09:53:06Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。