Fugu-MT 論文翻訳(概要): Solving Attention Kernel Regression Problem via Pre-conditioner

論文の概要: Solving Attention Kernel Regression Problem via Pre-conditioner

arxiv url: http://arxiv.org/abs/2308.14304v2
Date: Mon, 1 Apr 2024 22:30:22 GMT
ステータス: 翻訳完了
システム内更新日: 2024-04-04 13:12:17.179927
Title: Solving Attention Kernel Regression Problem via Pre-conditioner
Title（参考訳）: プレコンディショナーによる注意カーネル回帰問題の解法
Authors: Zhao Song, Junze Yin, Lichen Zhang,
Abstract要約: 我々は2種類の回帰問題に対するアルゴリズムを設計する:$min_xin mathbbRd|(Atop A)jx-b|$ for any positive integer $j$。 2番目のプロキシは、$exp(AAtop)$で表され、回帰$min_xin mathbbRn|exp(AAtop)xb |$で表されるグラム行列に指数的にエントリワイドを適用する。
参考スコア（独自算出の注目度）: 9.131385887605935
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: The attention mechanism is the key to large language models, and the attention matrix serves as an algorithmic and computational bottleneck for such a scheme. In this paper, we define two problems, motivated by designing fast algorithms for proxy of attention matrix and solving regressions against them. Given an input matrix $A\in \mathbb{R}^{n\times d}$ with $n\gg d$ and a response vector $b$, we first consider the matrix exponential of the matrix $A^\top A$ as a proxy, and we in turn design algorithms for two types of regression problems: $\min_{x\in \mathbb{R}^d}\|(A^\top A)^jx-b\|_2$ and $\min_{x\in \mathbb{R}^d}\|A(A^\top A)^jx-b\|_2$ for any positive integer $j$. Studying algorithms for these regressions is essential, as matrix exponential can be approximated term-by-term via these smaller problems. The second proxy is applying exponential entrywise to the Gram matrix, denoted by $\exp(AA^\top)$ and solving the regression $\min_{x\in \mathbb{R}^n}\|\exp(AA^\top)x-b \|_2$. We call this problem the attention kernel regression problem, as the matrix $\exp(AA^\top)$ could be viewed as a kernel function with respect to $A$. We design fast algorithms for these regression problems, based on sketching and preconditioning. We hope these efforts will provide an alternative perspective of studying efficient approximation of attention matrices.
Abstract（参考訳）: 注意機構は大規模言語モデルの鍵であり、注意行列はそのようなスキームのアルゴリズム的および計算的ボトルネックとして機能する。本稿では,注目行列のプロキシに高速なアルゴリズムを設計し,それに対する回帰を解くことによる2つの問題を定義する。入力行列 $A\in \mathbb{R}^{n\times d}$ と $n\gg d$ と応答ベクトル $b$ が与えられたとき、まず行列 $A^\top A$ の行列指数をプロキシとして考え、次に2種類の回帰問題に対するアルゴリズムを設計する: $\min_{x\in \mathbb{R}^d|(A^\top A)^jx-b\|_2$ と $\min_{x\in \mathbb{R}^d|A(A^\top A)^jx-b\|_2$ である。これらの回帰のアルゴリズムの研究は、行列指数がこれらのより小さな問題を通して長期的に近似できるため、不可欠である。第2のプロキシは、指数関数的にグラム行列に適用し、$\exp(AA^\top)$で表され、回帰$\min_{x\in \mathbb{R}^n}\|\exp(AA^\top)x-b \|_2$を解く。我々はこの問題を注目カーネル回帰問題と呼び、行列 $\exp(AA^\top)$ は$A$ に関してカーネル関数と見なすことができる。スケッチとプレコンディショニングに基づいて,これらの回帰問題に対する高速アルゴリズムを設計する。これらの取り組みが、注意行列の効率的な近似を研究するための代替的な視点を提供することを期待している。

関連論文リスト

Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions [23.539428616884035]
非対称ガウス・ケルネル行列に対する行列ベクトル積の高速アルゴリズムについて研究する。我々のアルゴリズムは、$K$に関する以下のモデリング仮定に依存している: 最悪のケースの成長とは対照的に、$K$のエントリの合計は$n$で線形にスケールする。我々は、この仮定の下で動作し、制約のない計算を行う最初の準四進時間アルゴリズムを得る。
論文参考訳（メタデータ） (2025-07-31T13:29:43Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
線形スケッチを用いた行列とベクトルノルムの残差誤差推定問題について検討する。これは、前作とほぼ同じスケッチサイズと精度で、経験的にかなり有利であることを示す。また、スパースリカバリ問題に対して$Omega(k2/pn1-2/p)$低いバウンダリを示し、これは$mathrmpoly(log n)$ factorまで厳密である。
論文参考訳（メタデータ） (2024-08-16T02:33:07Z)
How to Inverting the Leverage Score Distribution? [16.744561210470632]
ツールとして広く利用されているレバレッジスコアにもかかわらず、本論文では、新しい問題、すなわち反転レバレッジスコアについて検討する。我々は、ニュートン法における大域収束率を確保するために反復縮小と帰納仮説を用いる。この統計レバレッジの反転に関する重要な研究は、解釈、データリカバリ、セキュリティにおける多くの新しい応用を開放する。
論文参考訳（メタデータ） (2024-04-21T21:36:42Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
Structured Semidefinite Programming for Recovering Structured Preconditioners [41.28701750733703]
正定値$mathbfK を mathbbRd times d$ と $mathrmnnz(mathbfK)$ の 0 でないエントリで与えられるアルゴリズムは、時間内に$epsilon$-optimal diagonal preconditioner を計算する。我々は、行列辞書近似SDPと呼ばれる半定値プログラムのクラスに対して、新しいアルゴリズムを用いて結果を得る。
論文参考訳（メタデータ） (2023-10-27T16:54:29Z)
One-sided Matrix Completion from Two Observations Per Row [95.87811229292056]
行列の欠落値を$XTX$で計算する自然アルゴリズムを提案する。合成データの一方の回収と低被覆ゲノムシークエンシングについて,本アルゴリズムの評価を行った。
論文参考訳（メタデータ） (2023-06-06T22:35:16Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
本稿では, 2次行列分解(BMF)問題に対する効率的な$(1+varepsilon)$-approximationアルゴリズムを提案する。目標は、低ランク因子の積として$mathbfA$を近似することである。我々の手法はBMF問題の他の一般的な変種に一般化する。
論文参考訳（メタデータ） (2023-06-02T18:55:27Z)
Solving Regularized Exp, Cosh and Sinh Regression Problems [40.47799094316649]
注意計算はTransformer、GPT-4、ChatGPTといった大規模言語モデルの基本的なタスクである。素直な方法はニュートンの方法を使うことである。
論文参考訳（メタデータ） (2023-03-28T04:26:51Z)
A General Algorithm for Solving Rank-one Matrix Sensing [15.543065204102714]
マトリックスセンシングの目標は、一連の測定に基づいて、mathbbRn×n$の行列$A_starを復元することである。本稿では、このランク-$kの仮定を緩和し、より一般的な行列センシング問題を解く。
論文参考訳（メタデータ） (2023-03-22T04:07:26Z)
A Nearly-Optimal Bound for Fast Regression with $\ell_\infty$ Guarantee [16.409210914237086]
行列 $Ain mathbbRntimes d$ とテンソル $bin mathbbRn$ が与えられたとき、 $ell_infty$ の回帰問題を考える。このような$ell_infty$レグレッションの保証を得るためには、濃密なスケッチ行列を使わなければならない。我々はまた、OCE(Oblivious Coordinate-wise Embedding)特性を利用した $ell_infty$ guarantee regression のための新しい分析フレームワークを開発した。
論文参考訳（メタデータ） (2023-02-01T05:22:40Z)
Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
リッジ回帰問題に対する1+varepsilon$近似解を計算するスケッチベース反復アルゴリズムを提案する。また,このアルゴリズムがカーネルリッジ回帰の高速化に有効であることを示す。
論文参考訳（メタデータ） (2022-04-13T22:18:47Z)
Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
我々は、$AinmathbbRdtimes n$へのアクセスを考えると、潜入$k$-vertex simplex $KsubsetmathbbRdtimes n$を学習する問題を考える。実行時間における$k$への依存は、トップ$k$特異値の質量が$a$であるという自然な仮定から不要であることを示す。
論文参考訳（メタデータ） (2021-05-17T16:40:48Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。