Fugu-MT 論文翻訳(概要): Solving Tensor Low Cycle Rank Approximation

論文の概要: Solving Tensor Low Cycle Rank Approximation

arxiv url: http://arxiv.org/abs/2304.06594v1
Date: Thu, 13 Apr 2023 15:00:50 GMT
ステータス: 翻訳完了
システム内更新日: 2023-04-14 13:56:38.620077
Title: Solving Tensor Low Cycle Rank Approximation
Title（参考訳）: テンソル低サイクルランク近似の解法
Authors: Yichuan Deng, Yeqi Gao, Zhao Song
Abstract要約: 特定のテンソル低ランク近似問題を定式化し、テンソルサイクルランク(tensor cycle rank)と呼ぶことができる。テンソルの古典的位階、タッカーの位階、列車の位階については、[Song, Woodruff, Soda Zhong 2019]でよく研究されている。本稿では, [Song, Woodruff, Zhong SODA 2019] のページで, 以前の回転とスケッチのテクニックを一般化し, サイクルランクの入力空間時間アルゴリズムを示す。
参考スコア（独自算出の注目度）: 15.090593955414137
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Large language models have become ubiquitous in modern life, finding applications in various domains such as natural language processing, language translation, and speech recognition. Recently, a breakthrough work [Zhao, Panigrahi, Ge, and Arora Arxiv 2023] explains the attention model from probabilistic context-free grammar (PCFG). One of the central computation task for computing probability in PCFG is formulating a particular tensor low rank approximation problem, we can call it tensor cycle rank. Given an $n \times n \times n$ third order tensor $A$, we say that $A$ has cycle rank-$k$ if there exists three $n \times k^2$ size matrices $U , V$, and $W$ such that for each entry in each \begin{align*} A_{a,b,c} = \sum_{i=1}^k \sum_{j=1}^k \sum_{l=1}^k U_{a,i+k(j-1)} \otimes V_{b, j + k(l-1)} \otimes W_{c, l + k(i-1) } \end{align*} for all $a \in [n], b \in [n], c \in [n]$. For the tensor classical rank, tucker rank and train rank, it has been well studied in [Song, Woodruff, Zhong SODA 2019]. In this paper, we generalize the previous ``rotation and sketch'' technique in page 186 of [Song, Woodruff, Zhong SODA 2019] and show an input sparsity time algorithm for cycle rank.
Abstract（参考訳）: 大規模言語モデルは現代において、自然言語処理、言語翻訳、音声認識といった様々な領域で応用され、ユビキタスになってきた。最近、Zhao, Panigrahi, Ge, and Arora Arxiv 2023] が、確率論的文脈自由文法(PCFG)からの注意モデルについて説明している。 PCFGの確率計算における中心的な計算課題の1つは、特定のテンソル低ランク近似問題を定式化することであり、テンソルサイクルランクと呼ぶことができる。例えば、$n \times n \times n$ third order tensor $A$ が与えられたとき、$A$ がサイクルランク-$k$ を持つのは、3つの $n \times k^2$ size matrices $U , V$, and $W$ が存在して、それぞれの \begin{align*} A_{a,b,c} = \sum_{i=1}^k \sum_{j=1}^k \sum_{l=1}^k U_{a,i+k(j-1)} \otimes V_{b, j + k(l-1)} \otimes W_{c, l + k(i-1) } \end{align*} for all $a \in \in [n, b, c] テンソルの古典的位階、タッカーの位階、列車の位階については、[Song, Woodruff, Soda Zhong 2019]でよく研究されている。本稿では,[song, woodruff, zhong soda 2019]の186ページにおいて,これまでの ‘rotation and sketch'' 手法を一般化し,サイクルランクに対する入力スパーシティタイムアルゴリズムを示す。

関連論文リスト

Overcomplete Tensor Decomposition via Koszul-Young Flattenings [63.01248796170617]
最小ランク1項の和として$n_times n times n_3$ tensorを分解する新しいアルゴリズムを与える。次数-d$s のさらに一般的なクラスは、定数 $C = C(d)$ に対して階数 $Cn$ を超えることができないことを示す。
論文参考訳（メタデータ） (2024-11-21T17:41:09Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
線形スケッチを用いた行列とベクトルノルムの残差誤差推定問題について検討する。これは、前作とほぼ同じスケッチサイズと精度で、経験的にかなり有利であることを示す。また、スパースリカバリ問題に対して$Omega(k2/pn1-2/p)$低いバウンダリを示し、これは$mathrmpoly(log n)$ factorまで厳密である。
論文参考訳（メタデータ） (2024-08-16T02:33:07Z)
Sample-Efficient Linear Regression with Self-Selection Bias [7.605563562103568]
未知のインデックス設定における自己選択バイアスを伴う線形回帰の問題を考察する。我々は,$mathbfw_1,ldots,mathbfw_kinを復元する,新しい,ほぼ最適なサンプル効率($k$)アルゴリズムを提案する。このアルゴリズムは雑音の仮定をかなり緩めることに成功し、従って関連する最大線形回帰の設定にも成功している。
論文参考訳（メタデータ） (2024-02-22T02:20:24Z)
Provably learning a multi-head attention layer [55.2904547651831]
マルチヘッドアテンション層は、従来のフィードフォワードモデルとは分離したトランスフォーマーアーキテクチャの重要な構成要素の1つである。本研究では,ランダムな例から多面的注意層を実証的に学習する研究を開始する。最悪の場合、$m$に対する指数的依存は避けられないことを示す。
論文参考訳（メタデータ） (2024-02-06T15:39:09Z)
Structured Semidefinite Programming for Recovering Structured Preconditioners [41.28701750733703]
正定値$mathbfK を mathbbRd times d$ と $mathrmnnz(mathbfK)$ の 0 でないエントリで与えられるアルゴリズムは、時間内に$epsilon$-optimal diagonal preconditioner を計算する。我々は、行列辞書近似SDPと呼ばれる半定値プログラムのクラスに対して、新しいアルゴリズムを用いて結果を得る。
論文参考訳（メタデータ） (2023-10-27T16:54:29Z)
Do you know what q-means? [50.045011844765185]
クラスタリングは、大規模なデータセットを分析する上で最も重要なツールの1つである。クラスタリングのための"$q$-means"アルゴリズムの改良版を提案する。また、$Obig(frack2varepsilon2(sqrtkd + log(Nd))big で実行される $varepsilon に対する "dequantized" アルゴリズムも提示する。
論文参考訳（メタデータ） (2023-08-18T17:52:12Z)
Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension [18.57735939471469]
我々は注意問題のスパシフィケーションを考慮する。超大規模特徴量の場合、文の長さをほぼ線形に縮めることができる。
論文参考訳（メタデータ） (2023-04-10T05:52:38Z)
An Over-parameterized Exponential Regression [18.57735939471469]
LLM(Large Language Models)の分野での最近の発展は、指数的アクティベーション関数の使用への関心を喚起している。ニューラル関数 $F: mathbbRd times m times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRd times mathbbRdd
論文参考訳（メタデータ） (2023-03-29T07:29:07Z)
Average-Case Complexity of Tensor Decomposition for Low-Degree Polynomials [93.59919600451487]
多くの統計的推論タスクにおいて「統計計算ギャップ」が発生する。 1つの成分が他の成分よりもわずかに大きいランダムオーダー3分解モデルを考える。テンソルエントリは$ll n3/2$のとき最大成分を正確に推定できるが、$rgg n3/2$のとき失敗する。
論文参考訳（メタデータ） (2022-11-10T00:40:37Z)
Near-Linear Time and Fixed-Parameter Tractable Algorithms for Tensor Decompositions [51.19236668224547]
テンソルの低階近似について検討し,テンソルトレインとタッカー分解に着目した。テンソル列車の分解には、小さなビクリテリアランクを持つビクリテリア$(1 + eps)$-approximationアルゴリズムと、O(q cdot nnz(A))$ランニングタイムを与える。さらに、任意のグラフを持つテンソルネットワークにアルゴリズムを拡張します。
論文参考訳（メタデータ） (2022-07-15T11:55:09Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
Learning a Latent Simplex in Input-Sparsity Time [58.30321592603066]
我々は、$AinmathbbRdtimes n$へのアクセスを考えると、潜入$k$-vertex simplex $KsubsetmathbbRdtimes n$を学習する問題を考える。実行時間における$k$への依存は、トップ$k$特異値の質量が$a$であるという自然な仮定から不要であることを示す。
論文参考訳（メタデータ） (2021-05-17T16:40:48Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。