Fugu-MT 論文翻訳(概要): Efficient Parallelization of an Ubiquitous Sequential Computation

論文の概要: Efficient Parallelization of an Ubiquitous Sequential Computation

arxiv url: http://arxiv.org/abs/2311.06281v2
Date: Wed, 15 Nov 2023 14:53:19 GMT
ステータス: 翻訳完了
システム内更新日: 2023-11-16 19:11:28.756971
Title: Efficient Parallelization of an Ubiquitous Sequential Computation
Title（参考訳）: ユビキタスシーケンシャル計算の効率的な並列化
Authors: Franz A. Heinsen
Abstract要約: x_t = a_t x_t-1 + b_t$ を並列に計算するための簡潔な式が見つかる。 n$並列プロセッサでは、$n$要素の計算は$mathcalO(log n)$ timeと$mathcalO(n)$ spaceを発生させる。ソフトウェアで表現を実装し、並列ハードウェア上でテストし、$fracnlog n$の係数で逐次計算よりも高速に実行されることを検証します。
参考スコア（独自算出の注目度）: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We find a succinct expression for computing the sequence $x_t = a_t x_{t-1} + b_t$ in parallel with two prefix sums, given $t = (1, 2, \dots, n)$, $a_t \in \mathbb{R}^n$, $b_t \in \mathbb{R}^n$, and initial value $x_0 \in \mathbb{R}$. On $n$ parallel processors, the computation of $n$ elements incurs $\mathcal{O}(\log n)$ time and $\mathcal{O}(n)$ space. Sequences of this form are ubiquitous in science and engineering, making efficient parallelization useful for a vast number of applications. We implement our expression in software, test it on parallel hardware, and verify that it executes faster than sequential computation by a factor of $\frac{n}{\log n}$.
Abstract（参考訳）: x_t = a_t x_{t-1} + b_t$ を2つのプレフィックス和と並行して計算するための簡潔な式を見つけ、$t = (1, 2, \dots, n)$, $a_t \in \mathbb{R}^n$, $b_t \in \mathbb{R}^n$, initial value $x_0 \in \mathbb{R}$とする。 n$並列プロセッサでは、$n$要素の計算は$\mathcal{O}(\log n)$ timeと$\mathcal{O}(n)$ spaceを発生させる。この形式のシーケンスは科学や工学においてユビキタスであり、効率的な並列化は多数のアプリケーションに有用である。ソフトウェアで式を実装し、並列ハードウェアでテストし、$\frac{n}{\log n}$という係数でシーケンシャルな計算よりも高速に実行されることを検証します。

関連論文リスト

A Fast Multiplication Algorithm and RLWE-PLWE Equivalence for the Maximal Real Subfield of the $2^r p^s$-th Cyclotomic Field [0.0]
導体$n = 2r ps$ でシクロトミック場の最大実部分体に対する RLWE-PLWE 同値性を証明する。また、これらの実部分体の整数環における高速乗法アルゴリズムについても述べる。
論文参考訳（メタデータ） (2025-04-07T15:01:48Z)
Efficient Algorithm for Sparse Fourier Transform of Generalized $q$-ary Functions [0.3004066195320147]
GFastはFourier変換を$f$、サンプル複雑性は$O(Sn)$で計算する符号化理論アルゴリズムである。 GFastは、実世界の心臓疾患の診断とタンパク質の適合性モデルの説明を、最大13時間分のサンプルで行える。
論文参考訳（メタデータ） (2025-01-21T18:45:09Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
この問題は通信複雑性のランダム化を$Omega(frac1kcdot n2log|mathbbF|)$とする。アプリケーションとして、$k$パスを持つ任意のストリーミングアルゴリズムに対して、$Omega(frac1kcdot n2log|mathbbF|)$スペースローバウンドを得る。
論文参考訳（メタデータ） (2024-10-26T06:21:42Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
線形スケッチを用いた行列とベクトルノルムの残差誤差推定問題について検討する。これは、前作とほぼ同じスケッチサイズと精度で、経験的にかなり有利であることを示す。また、スパースリカバリ問題に対して$Omega(k2/pn1-2/p)$低いバウンダリを示し、これは$mathrmpoly(log n)$ factorまで厳密である。
論文参考訳（メタデータ） (2024-08-16T02:33:07Z)
Simple and Nearly-Optimal Sampling for Rank-1 Tensor Completion via Gauss-Jordan [49.1574468325115]
ランク1テンソルを$otimes_i=1N mathbbRd$で完了する際のサンプルと計算複雑性を再考する。本稿では,一対のランダム線形系上で,ガウス・ヨルダンに相当するアルゴリズムを許容する問題のキャラクタリゼーションを提案する。
論文参考訳（メタデータ） (2024-08-10T04:26:19Z)
Solving Dense Linear Systems Faster Than via Preconditioning [1.8854491183340518]
我々のアルゴリズムは$tilde O(n2)$ if $k=O(n0.729)$であることを示す。特に、我々のアルゴリズムは$tilde O(n2)$ if $k=O(n0.729)$である。主アルゴリズムはランダム化ブロック座標降下法とみなすことができる。
論文参考訳（メタデータ） (2023-12-14T12:53:34Z)
Fast Attention Requires Bounded Entries [19.17278873525312]
内部製品注意計算はTransformer, GPT-1, BERT, GPT-2, GPT-3, ChatGPTなどの大規模言語モデルを訓練するための基本的なタスクである。行列を暗黙的に$A$とすることで、より高速なアルゴリズムが可能かどうかを検討する。このことは、入力行列がより小さいエントリを持つ場合、注意計算の方がはるかに効率的である、実際に観察された現象の理論的な説明を与える。
論文参考訳（メタデータ） (2023-02-26T02:42:39Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
モノトン活性化に対する $mathbfxmapstosigma(mathbfwcdotmathbfx)$ の関数について検討する。学習者の目標は仮説ベクトル $mathbfw$ that $F(mathbbw)=C, epsilon$ を高い確率で出力することである。
論文参考訳（メタデータ） (2022-06-17T17:55:43Z)
Quantum machine learning with subspace states [8.22379888383833]
量子部分空間状態に基づく量子線型代数の新しいアプローチを導入し,新しい3つの量子機械学習アルゴリズムを提案する。 1つ目は、分布 $Pr[S]= det(X_SX_ST)$ for $|S|=d$ using $O(nd)$ gates and with circuit depth $O(dlog n)$である。 2つ目は、複素行列に対して$mathcalAk$の量子特異値推定アルゴリズムであり、このアルゴリズムの高速化は指数関数的である。
論文参考訳（メタデータ） (2022-01-31T19:34:47Z)
Computational Complexity of Normalizing Constants for the Product of Determinantal Point Processes [12.640283469603357]
正規化定数の計算における計算複雑性について検討する。例えば、$sum_Sdet(bf A_S,S)p$は、すべての(固定された)正の偶数に対して、$p$ が UP-hard で Mod$_3$P-hard であることを示す。
論文参考訳（メタデータ） (2021-11-28T14:08:25Z)
An Optimal Separation of Randomized and Quantum Query Complexity [67.19751155411075]
すべての決定木に対して、与えられた順序 $ellsqrtbinomdell (1+log n)ell-1,$ sum to at least $cellsqrtbinomdell (1+log n)ell-1,$ where $n$ is the number of variables, $d$ is the tree depth, $c>0$ is a absolute constant。
論文参考訳（メタデータ） (2020-08-24T06:50:57Z)
Continuous Submodular Maximization: Beyond DR-Submodularity [48.04323002262095]
最初に、バニラ座標の昇華の単純な変種を証明し、Coordinate-Ascent+ と呼ぶ。次にCoordinate-Ascent++を提案し、同じ回数のイテレーションを実行しながら(1-1/e-varepsilon)$-approximationを保証する。 Coordinate-Ascent++の各ラウンドの計算は容易に並列化でき、マシン当たりの計算コストは$O(n/sqrtvarepsilon+nlog n)$である。
論文参考訳（メタデータ） (2020-06-21T06:57:59Z)
Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
エントロピー正則化で最適な輸送を解くには、ベクトルに繰り返し適用される$ntimes n$ kernel matrixを計算する必要がある。代わりに、$c(x,y)=-logdotpvarphi(x)varphi(y)$ ここで$varphi$は、地上空間から正のorthant $RRr_+$への写像であり、$rll n$である。
論文参考訳（メタデータ） (2020-06-12T10:21:40Z)

関連論文リストは本サイト内にある論文のタイトル・アブストラクトから自動的に作成しています。