Parallel Algorithms Align with Neural Execution
- URL: http://arxiv.org/abs/2307.04049v2
- Date: Wed, 3 Jan 2024 12:34:37 GMT
- Title: Parallel Algorithms Align with Neural Execution
- Authors: Valerie Engelmayer, Dobrik Georgiev, Petar Veli\v{c}kovi\'c
- Abstract summary: Parallel algorithms however may exploit their full computational power, therefore requiring fewer layers to be executed.
This drastically reduces training times, as we observe when comparing parallel implementations of searching, sorting and finding strongly connected components to their sequential counterparts on the CLRS framework.
- Score: 7.535219325248997
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural algorithmic reasoners are parallel processors. Teaching them
sequential algorithms contradicts this nature, rendering a significant share of
their computations redundant. Parallel algorithms however may exploit their
full computational power, therefore requiring fewer layers to be executed. This
drastically reduces training times, as we observe when comparing parallel
implementations of searching, sorting and finding strongly connected components
to their sequential counterparts on the CLRS framework. Additionally, parallel
versions achieve (often strongly) superior predictive performance.
Related papers
- Optimal Parallelization of Boosting [10.985323882432086]
Recent works on the parallel complexity of Boosting have established strong lower bounds on the tradeoff between the number of training rounds $p$ and the total parallel work per round $t$.
Despite these advancements, a significant gap persists between the theoretical lower bounds and the performance of these algorithms across much of the tradeoff space.
In this work, we essentially close this gap by providing both improved lower bounds on the parallel complexity of weak-to-strong learners, and a parallel Boosting algorithm whose performance matches these bounds across the entire $p$ vs.$t$ compromise spectrum, up to logarithmic factors.
arXiv Detail & Related papers (2024-08-29T15:56:22Z) - Parallel Training of GRU Networks with a Multi-Grid Solver for Long
Sequences [1.9798034349981162]
We present a novel parallel training scheme (called parallel-in-time) for Gated Recurrent Unit (GRU) networks.
MGRIT partitions a sequence into multiple shorter sub-sequences and trains the sub-sequences on different processors in parallel.
Experimental results on the HMDB51 dataset, where each video is an image sequence, demonstrate that the new parallel training scheme achieves up to 6.5$times$ speedup over a serial approach.
arXiv Detail & Related papers (2022-03-07T11:32:44Z) - Parallelizing Contextual Linear Bandits [82.65675585004448]
We present a family of (parallel) contextual linear bandit algorithms, whose regret is nearly identical to their perfectly sequential counterparts.
We also present an empirical evaluation of these parallel algorithms in several domains, including materials discovery and biological sequence design problems.
arXiv Detail & Related papers (2021-05-21T22:22:02Z) - Fast Parallel Algorithms for Euclidean Minimum Spanning Tree and
Hierarchical Spatial Clustering [6.4805900740861]
We introduce a new notion of well-separation to reduce the work and space of our algorithm for HDBSCAN$*$.
We show that our algorithms are theoretically efficient: they have work (number of operations) matching their sequential counterparts, and polylogarithmic depth (parallel time)
Our experiments on large real-world and synthetic data sets using a 48-core machine show that our fastest algorithms outperform the best serial algorithms for the problems by 11.13--55.89x, and existing parallel algorithms by at least an order of magnitude.
arXiv Detail & Related papers (2021-04-02T16:05:00Z) - Temporal Parallelization of Inference in Hidden Markov Models [0.0]
This paper presents algorithms for parallelization of inference in hidden Markov models (HMMs)
We propose parallel backward-forward type of filtering and smoothing algorithm as well as parallel Viterbi-type maximum-a-posteriori (MAP)
We empirically compare the performance of the proposed methods to classical methods on a highly parallel processing unit (GPU)
arXiv Detail & Related papers (2021-02-10T21:26:09Z) - Parallel Training of Deep Networks with Local Updates [84.30918922367442]
Local parallelism is a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation.
We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.
arXiv Detail & Related papers (2020-12-07T16:38:45Z) - Coded Distributed Computing with Partial Recovery [56.08535873173518]
We introduce a novel coded matrix-vector multiplication scheme, called coded computation with partial recovery (CCPR)
CCPR reduces both the computation time and the decoding complexity by allowing a trade-off between the accuracy and the speed of computation.
We then extend this approach to distributed implementation of more general computation tasks by proposing a coded communication scheme with partial recovery.
arXiv Detail & Related papers (2020-07-04T21:34:49Z) - On Effective Parallelization of Monte Carlo Tree Search [51.15940034629022]
Monte Carlo Tree Search (MCTS) is computationally expensive as it requires a substantial number of rollouts to construct the search tree.
How to design effective parallel MCTS algorithms has not been systematically studied and remains poorly understood.
We demonstrate how proposed necessary conditions can be adopted to design more effective parallel MCTS algorithms.
arXiv Detail & Related papers (2020-06-15T21:36:00Z) - Minimal Filtering Algorithms for Convolutional Neural Networks [82.24592140096622]
We develop fully parallel hardware-oriented algorithms for implementing the basic filtering operation for M=3,5,7,9, and 11.
A fully parallel hardware implementation of the proposed algorithms in each case gives approximately 30 percent savings in the number of embedded multipliers.
arXiv Detail & Related papers (2020-04-12T13:18:25Z) - Accelerating Feedforward Computation via Parallel Nonlinear Equation
Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning.
We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both.
Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.