Fugu-MT 論文翻訳(概要): Learning a Latent Simplex in Input-Sparsity Time

論文の概要: Learning a Latent Simplex in Input-Sparsity Time

arxiv url: http://arxiv.org/abs/2105.08005v1
Date: Mon, 17 May 2021 16:40:48 GMT
ステータス: 翻訳完了
システム内更新日: 2021-05-18 18:07:40.194241
Title: Learning a Latent Simplex in Input-Sparsity Time
Title（参考訳）: 入力分離時間における潜在単純性学習
Authors: Ainesh Bakshi, Chiranjib Bhattacharyya, Ravi Kannan, David P. Woodruff and Samson Zhou
Abstract要約: 我々は、$AinmathbbRdtimes n$へのアクセスを考えると、潜入$k$-vertex simplex $KsubsetmathbbRdtimes n$を学習する問題を考える。実行時間における$k$への依存は、トップ$k$特異値の質量が$a$であるという自然な仮定から不要であることを示す。
参考スコア（独自算出の注目度）: 58.30321592603066
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We consider the problem of learning a latent $k$-vertex simplex $K\subset\mathbb{R}^d$, given access to $A\in\mathbb{R}^{d\times n}$, which can be viewed as a data matrix with $n$ points that are obtained by randomly perturbing latent points in the simplex $K$ (potentially beyond $K$). A large class of latent variable models, such as adversarial clustering, mixed membership stochastic block models, and topic models can be cast as learning a latent simplex. Bhattacharyya and Kannan (SODA, 2020) give an algorithm for learning such a latent simplex in time roughly $O(k\cdot\textrm{nnz}(A))$, where $\textrm{nnz}(A)$ is the number of non-zeros in $A$. We show that the dependence on $k$ in the running time is unnecessary given a natural assumption about the mass of the top $k$ singular values of $A$, which holds in many of these applications. Further, we show this assumption is necessary, as otherwise an algorithm for learning a latent simplex would imply an algorithmic breakthrough for spectral low rank approximation. At a high level, Bhattacharyya and Kannan provide an adaptive algorithm that makes $k$ matrix-vector product queries to $A$ and each query is a function of all queries preceding it. Since each matrix-vector product requires $\textrm{nnz}(A)$ time, their overall running time appears unavoidable. Instead, we obtain a low-rank approximation to $A$ in input-sparsity time and show that the column space thus obtained has small $\sin\Theta$ (angular) distance to the right top-$k$ singular space of $A$. Our algorithm then selects $k$ points in the low-rank subspace with the largest inner product with $k$ carefully chosen random vectors. By working in the low-rank subspace, we avoid reading the entire matrix in each iteration and thus circumvent the $\Theta(k\cdot\textrm{nnz}(A))$ running time.
Abstract（参考訳）: k$-vertex simplex $K\subset\mathbb{R}^d$, given access to $A\in\mathbb{R}^{d\times n}$, which can seen as a data matrix with $n$ points that are obtained by randomly perturbing latent points in the simplex $K$ (potentially beyond $K$)。逆クラスタリング、混合メンバシップ確率ブロックモデル、トピックモデルなど、潜在変数モデルの大きなクラスは、潜在単純性の学習としてキャストできる。 Bhattacharyya and Kannan (SODA, 2020) は、およそ$O(k\cdot\textrm{nnz}(A))$、$\textrm{nnz}(A)$は$A$の非ゼロの数である。実行時間における$k$への依存は、これらの多くのアプリケーションで成り立つ$a$の最上位の$k$特異値の質量に関する自然な仮定を考えると不要であることを示している。さらに, 潜在単純性を学ぶアルゴリズムは, スペクトル低ランク近似のアルゴリズム的ブレークスルーを意味するため, この仮定は必要であることを示す。高いレベルでは、bhattacharyyaとkannanは、$k$ matrix-vector製品クエリを$a$とし、各クエリは、それ以前のすべてのクエリの関数である適応アルゴリズムを提供する。各行列ベクトル積は $\textrm{nnz}(A)$ time を必要とするので、全体の実行時間は避けられない。代わりに、入力スパース時間にa$に低いランクの近似を求め、それによって得られる列空間が、右トップ-k$特異空間にわずかに$\sin\theta$(三角形)距離を持つことを示す。アルゴリズムは次に、最も大きい内部積を持つ低ランク部分空間の$k$ポイントを、慎重に選択されたランダムベクトルで選択する。低ランクな部分空間で作業することで、各イテレーションで行列全体の読み込みを回避し、$\Theta(k\cdot\textrm{nnz}(A))$ run timeを回避します。

関連論文リスト

Improved Algorithms for Kernel Matrix-Vector Multiplication Under Sparsity Assumptions [23.539428616884035]
非対称ガウス・ケルネル行列に対する行列ベクトル積の高速アルゴリズムについて研究する。我々のアルゴリズムは、$K$に関する以下のモデリング仮定に依存している: 最悪のケースの成長とは対照的に、$K$のエントリの合計は$n$で線形にスケールする。我々は、この仮定の下で動作し、制約のない計算を行う最初の準四進時間アルゴリズムを得る。
論文参考訳（メタデータ） (2025-07-31T13:29:43Z)
LevAttention: Time, Space, and Streaming Efficient Algorithm for Heavy Attentions [54.54897832889028]
任意の$K$に対して、$n$とは独立に「普遍集合」$Uサブセット[n]$が存在し、任意の$Q$と任意の行$i$に対して、大きな注目スコアが$A_i,j$ in row $i$ of $A$は全て$jin U$を持つことを示す。我々は、視覚変換器のスキームの利点を実証的に示し、トレーニング中に我々の普遍的なセットを使用する新しいモデルのトレーニング方法を示した。
論文参考訳（メタデータ） (2024-10-07T19:47:13Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
線形スケッチを用いた行列とベクトルノルムの残差誤差推定問題について検討する。これは、前作とほぼ同じスケッチサイズと精度で、経験的にかなり有利であることを示す。また、スパースリカバリ問題に対して$Omega(k2/pn1-2/p)$低いバウンダリを示し、これは$mathrmpoly(log n)$ factorまで厳密である。
論文参考訳（メタデータ） (2024-08-16T02:33:07Z)
Oja's Algorithm for Streaming Sparse PCA [7.059472280274011]
Oja's Algorithm for Streaming principal Component Analysis (PCA) for $n$ data-points in a $d$ dimensional space achieves the same sin-squared error $O(r_mathsfeff/n)$ as the offline algorithm in $O(d)$ space and $O(nd)$ time。 Ojaのアルゴリズムの出力をしきい値にする単純なシングルパス手順は、$O(d)$ space と $O(nd)$ time の正則性条件下での最小誤差を達成できることを示す。
論文参考訳（メタデータ） (2024-02-11T16:36:48Z)
Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension [18.57735939471469]
我々は注意問題のスパシフィケーションを考慮する。超大規模特徴量の場合、文の長さをほぼ線形に縮めることができる。
論文参考訳（メタデータ） (2023-04-10T05:52:38Z)
Fast Attention Requires Bounded Entries [19.17278873525312]
内部製品注意計算はTransformer, GPT-1, BERT, GPT-2, GPT-3, ChatGPTなどの大規模言語モデルを訓練するための基本的なタスクである。行列を暗黙的に$A$とすることで、より高速なアルゴリズムが可能かどうかを検討する。このことは、入力行列がより小さいエントリを持つ場合、注意計算の方がはるかに効率的である、実際に観察された現象の理論的な説明を与える。
論文参考訳（メタデータ） (2023-02-26T02:42:39Z)
Sketching Algorithms and Lower Bounds for Ridge Regression [65.0720777731368]
リッジ回帰問題に対する1+varepsilon$近似解を計算するスケッチベース反復アルゴリズムを提案する。また,このアルゴリズムがカーネルリッジ回帰の高速化に有効であることを示す。
論文参考訳（メタデータ） (2022-04-13T22:18:47Z)
Active Local Learning [22.823171570933397]
クエリポイント$x$とラベルなしのトレーニングセット$S$へのアクティブアクセスを与えられた場合、H$でほぼ最適な$hの予測$h(x)$を出力します。特にラベルクエリの数は$H$の複雑さとは無関係でなければならないし、関数$h$は$x$とは無関係に明確に定義されるべきである。これはまた、すぐに距離推定のアルゴリズムを示唆している: ほぼ最適の$hを実際に学習するのに必要となる多くのラベルから$opt(H)$を推定する。
論文参考訳（メタデータ） (2020-08-31T05:39:35Z)
The Average-Case Time Complexity of Certifying the Restricted Isometry Property [66.65353643599899]
圧縮センシングにおいて、100万倍のN$センシング行列上の制限等尺性(RIP)はスパースベクトルの効率的な再構成を保証する。 Mtimes N$ matrices with i.d.$mathcalN(0,1/M)$ entry。
論文参考訳（メタデータ） (2020-05-22T16:55:01Z)
Average Case Column Subset Selection for Entrywise $\ell_1$-Norm Loss [76.02734481158458]
最悪の場合、行列に対する良いランク-$k$近似を得るには、任意に大きい$nOmega(1)$列数が必要であることが知られている。最小かつ現実的な分布設定では、ほぼ線形な実行時間を持つ$(k/epsilon)$-approximationとpoly$(k/epsilon)+O(klog n)$ columnsが得られる。これは、エントリワイズで$(k/epsilon)$-approximationを達成するための任意の種類の最初のアルゴリズムである
論文参考訳（メタデータ） (2020-04-16T22:57:06Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。

指定された論文の情報です。
本サイトの運営者は本サイト（すべての情報・翻訳含む）の品質を保証せず、本サイト（すべての情報・翻訳含む）を使用して発生したあらゆる結果について一切の責任を負いません。