Fugu-MT 論文翻訳(概要): cosFormer: Rethinking Softmax in Attention

論文の概要: cosFormer: Rethinking Softmax in Attention

arxiv url: http://arxiv.org/abs/2202.08791v1
Date: Thu, 17 Feb 2022 17:53:48 GMT
ステータス: 翻訳完了
システム内更新日: 2022-02-18 14:51:31.755226
Title: cosFormer: Rethinking Softmax in Attention
Title（参考訳）: cosformer:softmaxの注目を再考する
Authors: Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, Yiran Zhong
Abstract要約: カーネルメソッドは、ソフトマックス演算子を近似することで複雑さを減らすためにしばしば採用される。近似誤差のため、それらのパフォーマンスは異なるタスク/コーパスで異なり、重要なパフォーマンス低下を被る。本稿では,バニラ変圧器に匹敵する精度を達成できる,cosFormerと呼ばれる線形変圧器を提案する。
参考スコア（独自算出の注目度）: 60.557869510885205
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer has shown great successes in natural language processing, computer vision, and audio processing. As one of its core components, the softmax attention helps to capture long-range dependencies yet prohibits its scale-up due to the quadratic space and time complexity to the sequence length. Kernel methods are often adopted to reduce the complexity by approximating the softmax operator. Nevertheless, due to the approximation errors, their performances vary in different tasks/corpus and suffer crucial performance drops when compared with the vanilla softmax attention. In this paper, we propose a linear transformer called cosFormer that can achieve comparable or better accuracy to the vanilla transformer in both casual and cross attentions. cosFormer is based on two key properties of softmax attention: i). non-negativeness of the attention matrix; ii). a non-linear re-weighting scheme that can concentrate the distribution of the attention matrix. As its linear substitute, cosFormer fulfills these properties with a linear operator and a cosine-based distance re-weighting mechanism. Extensive experiments on language modeling and text understanding tasks demonstrate the effectiveness of our method. We further examine our method on long sequences and achieve state-of-the-art performance on the Long-Range Arena benchmark. The source code is available at https://github.com/OpenNLPLab/cosFormer.
Abstract（参考訳）: Transformerは自然言語処理、コンピュータビジョン、オーディオ処理で大きな成功を収めている。コアコンポーネントの1つとして、ソフトマックスアテンションは長距離依存を捉えるのに役立つが、2次空間とシーケンス長の時間的複雑さのためにスケールアップを禁止している。カーネル法はソフトマックス演算子を近似することで複雑さを減らすためによく用いられる。それにもかかわらず、近似誤差のため、その性能は異なるタスク/コーパスで異なり、バニラソフトマックスの注意と比べ、重要な性能低下に苦しむ。本稿では,カジュアル・クロスの両面において,バニラ変圧器に匹敵する精度を達成できる,cosFormerと呼ばれる線形変圧器を提案する。 cosformerはsoftmax attentionの2つの重要な特性に基づいている。私)。注意行列の非負性 i)。注意行列の分布に集中できる非線形再重み付けスキーム。線型代用として、cosFormerは線型作用素とコサインに基づく距離再重み付け機構でこれらの特性を満たす。言語モデルとテキスト理解タスクに関する広範な実験により,本手法の有効性が示された。さらに,本手法を長手シーケンスで検討し,長手領域のarenaベンチマークで最先端の性能を実現する。ソースコードはhttps://github.com/OpenNLPLab/cosFormerで入手できる。

関連論文リスト

Bridging the Divide: Reconsidering Softmax and Linear Attention [116.34723260730405]
線形注意の限界を理解し緩和する2つの重要な視点を提示する。線形注意は単射ではなく、異なるクエリベクトルに同一の注意重みを割り当てる傾向があることを証明した。第2に,線形の注意が不足するソフトマックスの注意を成功させるためには,効果的な局所モデリングが不可欠であることを確認した。
論文参考訳（メタデータ） (2024-12-09T15:44:22Z)
Cottention: Linear Transformers With Cosine Attention [2.762180345826837]
ソフトマックス操作をコサイン類似性に置き換える新しい注意機構であるCottentionを導入する。 Cottentionは、配列長に関してネイティブな線形メモリ複雑性を実現し、ソフトマックスの注意よりも本質的にメモリ効率が良い。
論文参考訳（メタデータ） (2024-09-27T13:38:36Z)
Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer [36.75562615596186]
我々は、Mixed Attention Spansを用いた実装が容易な変圧器であるMASFormerを提案する。 MASFormerは、長距離依存関係をキャプチャするために完全に注意を払っているが、少数の層しか持たない。実験の結果,1.3BパラメータのデコーダのみのMASFormerモデルは,バニラ変圧器との競合性能を十分に発揮できることがわかった。
論文参考訳（メタデータ） (2023-10-19T03:32:05Z)
FLatten Transformer: Vision Transformer using Focused Linear Attention [80.61335173752146]
線形注意(linear attention)は、その線形複雑性に対して、はるかに効率的な代替手段を提供する。現在の線形アテンションアプローチは、大きなパフォーマンス劣化に悩まされるか、追加の計算オーバーヘッドを導入するかのいずれかである。本研究では,高効率と表現性の両方を実現するために,新しいFocused Linear Attentionモジュールを提案する。
論文参考訳（メタデータ） (2023-08-01T10:37:12Z)
Softmax-free Linear Transformers [90.83157268265654]
視覚変換器(ViT)は、視覚知覚タスクの最先端を推し進めている。既存の手法は理論的に欠陥があるか、視覚認識に経験的に効果がないかのいずれかである。我々はSoftmax-Free Transformers (SOFT) のファミリーを提案する。
論文参考訳（メタデータ） (2022-07-05T03:08:27Z)
SOFT: Softmax-free Transformer with Linear Complexity [112.9754491864247]
視覚変換器(ViT)は、パッチワイド画像トークン化と自己認識によって、様々な視覚認識タスクの最先端を推し進めている。線形複雑度で自己注意を近似する様々な試みが自然言語処理で行われている。これらの制限は、近似中にソフトマックスの自己注意を維持することに根ざしている。ソフトマックスフリー変圧器(SOFT)を初めて提案する。
論文参考訳（メタデータ） (2021-10-22T17:57:29Z)
Combiner: Full Attention Transformer with Sparse Computation Cost [142.10203598824964]
計算の複雑さを低く保ちつつ、各注目ヘッドにフルアテンション機能を提供するコンバインダを提案する。既存のスパース変圧器で使用されるスパースアテンションパターンのほとんどは、そのような分解設計をフルアテンションに刺激することができることを示す。自己回帰的タスクと双方向シーケンスタスクの両方に関する実験的評価は、このアプローチの有効性を示す。
論文参考訳（メタデータ） (2021-07-12T22:43:11Z)
Luna: Linear Unified Nested Attention [71.66026714473482]
本稿では,2つの重み付き線形注意関数でソフトマックスアテンションを近似する線形統合ネスト型注意機構であるLunaを提案する。具体的には、第1の注意関数により、Lunaは入力シーケンスを固定長のシーケンスにまとめ、次に、第2の注意関数を使用して充填シーケンスをアンパックする。従来のアテンション機構と比較して、Lunaは入力として固定長の付加シーケンスとそれに対応する出力を導入し、Lunaはアテンション操作を線形に行うことができる。
論文参考訳（メタデータ） (2021-06-03T01:47:26Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。