Fugu-MT 論文翻訳(概要): On the Expressive Power of Self-Attention Matrices

論文の概要: On the Expressive Power of Self-Attention Matrices

arxiv url: http://arxiv.org/abs/2106.03764v1
Date: Mon, 7 Jun 2021 16:30:28 GMT
ステータス: 翻訳完了
システム内更新日: 2021-06-08 17:46:52.743417
Title: On the Expressive Power of Self-Attention Matrices
Title（参考訳）: 自己愛行列の表現力について
Authors: Valerii Likhosherstov, Krzysztof Choromanski, Adrian Weller
Abstract要約: 固定自己アテンションモジュールは入力に応じて任意のスパースパターンを近似することができることを示す。行列を近似するために適応的な入力と固定された自己アテンションパラメータを求めるアルゴリズムを提案する。
参考スコア（独自算出の注目度）: 41.72598286888797
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Transformer networks are able to capture patterns in data coming from many domains (text, images, videos, proteins, etc.) with little or no change to architecture components. We perform a theoretical analysis of the core component responsible for signal propagation between elements, i.e. the self-attention matrix. In practice, this matrix typically exhibits two properties: (1) it is sparse, meaning that each token only attends to a small subset of other tokens; and (2) it changes dynamically depending on the input to the module. With these considerations in mind, we ask the following question: Can a fixed self-attention module approximate arbitrary sparse patterns depending on the input? How small is the hidden size $d$ required for such approximation? We make progress in answering this question and show that the self-attention matrix can provably approximate sparse matrices, where sparsity is in terms of a bounded number of nonzero elements in each row and column. While the parameters of self-attention are fixed, various sparse matrices can be approximated by only modifying the inputs. Our proof is based on the random projection technique and uses the seminal Johnson-Lindenstrauss lemma. Our proof is constructive, enabling us to propose an algorithm for finding adaptive inputs and fixed self-attention parameters in order to approximate a given matrix. In particular, we show that, in order to approximate any sparse matrix up to a given precision defined in terms of preserving matrix element ratios, $d$ grows only logarithmically with the sequence length $L$ (i.e. $d = O(\log L)$).
Abstract（参考訳）: トランスフォーマーネットワークは、多くのドメイン(テキスト、画像、ビデオ、タンパク質など)から来るデータのパターンをキャプチャすることができる。アーキテクチャコンポーネントの変更はほとんど、あるいはまったくありません。元素間の信号伝達に寄与するコア成分の理論的解析を行う。 self-attention 行列。実際には、この行列は一般に2つの性質を示す: (1) スパース(sparse)、つまり、各トークンは他のトークンの小さなサブセットにのみ対応し、(2) モジュールへの入力に応じて動的に変化する。これらの考察を念頭に置いて、我々は以下の質問をする: 固定された自己完結モジュールは、入力に応じて任意のスパースパターンを近似できるか? そのような近似のために隠されたサイズ$d$はどのくらい小さいか? 我々はこの問題への回答を進歩させ、自着行列が各列と列の非零要素の有界数でスパース行列を近似できることを示す。自己注意のパラメータは固定されているが、様々なスパース行列は入力を変更するだけで近似できる。我々の証明はランダム射影法に基づいており、半音節の Johnson-Lindenstrauss lemma を用いる。この証明は構成的であり、与えられた行列を近似するために適応入力と固定自己着パラメータを求めるアルゴリズムを提案することができる。特に、任意のスパース行列を行列要素比を保存するために定義された精度まで近似するために、$d$は列長$L$(すなわち)と対数的にしか成長しないことを示す。 $d = O(\log L)$)。

関連論文リスト

Explaining Deep Network Classification of Matrices: A Case Study on Monotonicity [0.0]
この研究は、ディープラーニングを用いて行列を分類するための単純で実践的な基準を発見する方法を示す。高性能ニューラルネットワークと説明可能なAI(XAI)技術を組み合わせることで、モデルの学習戦略を人間の解釈可能なルールに抽出することができる。
論文参考訳（メタデータ） (2025-07-30T10:55:44Z)
Streaming Private Continual Counting via Binning [11.72102598708538]
我々は、$textitbinning$を介して低空間における分解機構を近似する簡単な方法を提案する。空間利用が極端に少ない場合でも、最適分解機構の性能は密に一致し、時には上回ることができることを実証的に示す。
論文参考訳（メタデータ） (2024-12-10T01:21:56Z)
Optimal Quantization for Matrix Multiplication [35.007966885532724]
我々は、ネスト格子に基づく普遍量化器を、任意の(非ランダムな)行列対に対する近似誤差の明示的な保証付きで、フロベニウスノルム$|A|_F, |B|_F$, $|Atop B|_F$のみの観点から、$A$, $B$とする。
論文参考訳（メタデータ） (2024-10-17T17:19:48Z)
Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers [16.046186753149]
最近のLarge Language Models(LLM)におけるトランスフォーマーの成功の鍵は自己認識メカニズムである我々は、注目行列の畳み込み様構造を利用して、畳み込み行列を用いた注目の効率的な近似法を開発する。トランスフォーマーモデルにおけるアテンション計算を加速するための新しいパラダイムが、より長いコンテキストへのアプリケーションを支援することを願っています。
論文参考訳（メタデータ） (2024-05-08T17:11:38Z)
One-sided Matrix Completion from Two Observations Per Row [95.87811229292056]
行列の欠落値を$XTX$で計算する自然アルゴリズムを提案する。合成データの一方の回収と低被覆ゲノムシークエンシングについて,本アルゴリズムの評価を行った。
論文参考訳（メタデータ） (2023-06-06T22:35:16Z)
Sparse Factorization of Large Square Matrices [10.94053598642913]
本稿では,大面積の正方行列とスパースフルランク行列の積を近似する。近似では、我々の手法は$Ntimes N$ full matrix に対して$N(log N)2$ non-zero number しか必要としない。近似行列がスパースかつハイランクである場合,本手法により近似精度が向上することを示す。
論文参考訳（メタデータ） (2021-09-16T18:42:21Z)
Non-PSD Matrix Sketching with Applications to Regression and Optimization [56.730993511802865]
非PSDおよび2乗根行列の次元削減法を提案する。複数のダウンストリームタスクにこれらのテクニックをどのように使用できるかを示す。
論文参考訳（メタデータ） (2021-06-16T04:07:48Z)
Linear-Sample Learning of Low-Rank Distributions [56.59844655107251]
ktimes k$, rank-r$, matrices to normalized $L_1$ distance requires $Omega(frackrepsilon2)$ sample。我々は、$cal O(frackrepsilon2log2fracepsilon)$ sample, a number linear in the high dimension, and almost linear in the matrices, usually low, rank proofs.というアルゴリズムを提案する。
論文参考訳（メタデータ） (2020-09-30T19:10:32Z)
Compressed sensing of low-rank plus sparse matrices [3.8073142980733]
この写本は、ランクラパース行列と$sスパース行列の和として表現できる$mtimes n$が計算的に抽出可能な方法で復元可能であることを示す同様の保証を開発する。その結果, 合成問題, 動的地上/静電分離, マルチスペクトルイメージング, ロバストPCAが得られた。
論文参考訳（メタデータ） (2020-07-18T15:36:11Z)
Spectral Learning on Matrices and Tensors [74.88243719463053]
テンソル分解は行列法で欠落する潜伏効果を拾うことができることを示す。また,効率的なテンソル分解法を設計するための計算手法についても概説する。
論文参考訳（メタデータ） (2020-04-16T22:53:00Z)
Sketching Transformed Matrices with Applications to Natural Language Processing [76.6222695417524]
本稿では, 変換行列を用いて, 与えられた小さな行列の積を計算するための空間効率のよいスケッチアルゴリズムを提案する。提案手法は誤差が小さく,空間と時間の両方で効率がよいことを示す。
論文参考訳（メタデータ） (2020-02-23T03:07:31Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。