Related papers: Efficient Parallelization of a Ubiquitous Sequential Computation

Efficient Parallelization of a Ubiquitous Sequential Computation

URL: http://arxiv.org/abs/2311.06281v4
Date: Wed, 27 Dec 2023 12:50:55 GMT
Title: Efficient Parallelization of a Ubiquitous Sequential Computation
Authors: Franz A. Heinsen
Abstract summary: We find a succinct expression for computing the sequence $x_t = a_t x_t-1 + b_t$ in parallel. On $n$ parallel processors, the computation of $n$ elements incurs $mathcalO(log n)$ time and $mathcalO(n)$ space. We implement our expression in software, test it on parallel hardware, and verify that it executes faster than sequential computation by a factor of $fracnlog n$.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We find a succinct expression for computing the sequence $x_t = a_t x_{t-1} + b_t$ in parallel with two prefix sums, given $t = (1, 2, \dots, n)$, $a_t \in \mathbb{R}^n$, $b_t \in \mathbb{R}^n$, and initial value $x_0 \in \mathbb{R}$. On $n$ parallel processors, the computation of $n$ elements incurs $\mathcal{O}(\log n)$ time and $\mathcal{O}(n)$ space. Sequences of this form are ubiquitous in science and engineering, making efficient parallelization useful for a vast number of applications. We implement our expression in software, test it on parallel hardware, and verify that it executes faster than sequential computation by a factor of $\frac{n}{\log n}$.

Related papers

A Fast Multiplication Algorithm and RLWE-PLWE Equivalence for the Maximal Real Subfield of the $2^r p^s$-th Cyclotomic Field [0.0]
We prove the RLWE-PLWE equivalence for the maximal real subfields of the cyclotomic fields with conductor $n = 2r ps$. We also describe a fast multiplication algorithm in the ring of integers of these real subfields.
arXiv Detail & Related papers (2025-04-07T15:01:48Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
Optimal Sketching for Residual Error Estimation for Matrix and Vector Norms [50.15964512954274]
We study the problem of residual error estimation for matrix and vector norms using a linear sketch. We demonstrate that this gives a substantial advantage empirically, for roughly the same sketch size and accuracy as in previous work. We also show an $Omega(k2/pn1-2/p)$ lower bound for the sparse recovery problem, which is tight up to a $mathrmpoly(log n)$ factor.
arXiv Detail & Related papers (2024-08-16T02:33:07Z)
Simple and Nearly-Optimal Sampling for Rank-1 Tensor Completion via Gauss-Jordan [49.1574468325115]
We revisit the sample and computational complexity of completing a rank-1 tensor in $otimes_i=1N mathbbRd$. We present a characterization of the problem which admits an algorithm amounting to Gauss-Jordan on a pair of random linear systems.
arXiv Detail & Related papers (2024-08-10T04:26:19Z)
Solving Dense Linear Systems Faster Than via Preconditioning [1.8854491183340518]
We show that our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$. In particular, our algorithm has an $tilde O(n2)$ when $k=O(n0.729)$. Our main algorithm can be viewed as a randomized block coordinate descent method.
arXiv Detail & Related papers (2023-12-14T12:53:34Z)
Fast Attention Requires Bounded Entries [19.17278873525312]
inner product attention computation is a fundamental task for training large language models such as Transformer, GPT-1, BERT, GPT-2, GPT-3 and ChatGPT. We investigate whether faster algorithms are possible by implicitly making use of the matrix $A$. This gives a theoretical explanation for the phenomenon observed in practice that attention computation is much more efficient when the input matrices have smaller entries.
arXiv Detail & Related papers (2023-02-26T02:42:39Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Quantum machine learning with subspace states [8.22379888383833]
We introduce a new approach for quantum linear algebra based on quantum subspace states and present three new quantum machine learning algorithms. The first is a quantum sampling algorithm that samples from the distribution $Pr[S]= det(X_SX_ST)$ for $|S|=d$ using $O(nd)$ gates and with circuit depth $O(dlog n)$. The second is a quantum singular value estimation algorithm for compound matrices $mathcalAk$, the speedup for this algorithm is potentially exponential.
arXiv Detail & Related papers (2022-01-31T19:34:47Z)
Computational Complexity of Normalizing Constants for the Product of Determinantal Point Processes [12.640283469603357]
We study the computational complexity of computing the normalizing constant. We show that $sum_Sdet(bf A_S,S)p$ exactly for every (fixed) positive even integer $p$ is UP-hard and Mod$_3$P-hard.
arXiv Detail & Related papers (2021-11-28T14:08:25Z)
Continuous Submodular Maximization: Beyond DR-Submodularity [48.04323002262095]
We first prove a simple variant of the vanilla coordinate ascent, called Coordinate-Ascent+. We then propose Coordinate-Ascent++, that achieves tight $(1-1/e-varepsilon)$-approximation guarantee while performing the same number of iterations. The computation of each round of Coordinate-Ascent++ can be easily parallelized so that the computational cost per machine scales as $O(n/sqrtvarepsilon+nlog n)$.
arXiv Detail & Related papers (2020-06-21T06:57:59Z)
Linear Time Sinkhorn Divergences using Positive Features [51.50788603386766]
Solving optimal transport with an entropic regularization requires computing a $ntimes n$ kernel matrix that is repeatedly applied to a vector. We propose to use instead ground costs of the form $c(x,y)=-logdotpvarphi(x)varphi(y)$ where $varphi$ is a map from the ground space onto the positive orthant $RRr_+$, with $rll n$.
arXiv Detail & Related papers (2020-06-12T10:21:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.