Relaxed Scheduling for Scalable Belief Propagation
- URL: http://arxiv.org/abs/2002.11505v2
- Date: Mon, 18 Jan 2021 15:54:24 GMT
- Title: Relaxed Scheduling for Scalable Belief Propagation
- Authors: Vitaly Aksenov and Dan Alistarh and Janne H. Korhonen
- Abstract summary: We focus on efficient parallel algorithms for the key machine learning task of inference on graphical models.
We show how to leverage relaxed scalable schedulers in this context.
Our approach outperforms previous parallel belief propagation implementations both in terms of scalability and in terms of wall-clock convergence time.
- Score: 27.226933626287305
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The ability to leverage large-scale hardware parallelism has been one of the
key enablers of the accelerated recent progress in machine learning.
Consequently, there has been considerable effort invested into developing
efficient parallel variants of classic machine learning algorithms. However,
despite the wealth of knowledge on parallelization, some classic machine
learning algorithms often prove hard to parallelize efficiently while
maintaining convergence.
In this paper, we focus on efficient parallel algorithms for the key machine
learning task of inference on graphical models, in particular on the
fundamental belief propagation algorithm. We address the challenge of
efficiently parallelizing this classic paradigm by showing how to leverage
scalable relaxed schedulers in this context. We present an extensive empirical
study, showing that our approach outperforms previous parallel belief
propagation implementations both in terms of scalability and in terms of
wall-clock convergence time, on a range of practical applications.
Related papers
- Enhancing Parallelism in Decentralized Stochastic Convex Optimization [10.632248569865236]
We propose Decentralized Anytime SGD, a novel decentralized learning algorithm that significantly extends the critical parallelism threshold.<n>Within the convex optimization (SCO) framework, we establish a theoretical upper bound on parallelism that surpasses the current state-of-the-art.
arXiv Detail & Related papers (2025-06-01T11:17:32Z) - Automatically Planning Optimal Parallel Strategy for Large Language Models [9.804975588324035]
We propose an automatic parallel algorithm that automatically plans the parallel strategy with maximum throughput.
By decoupling the training time into computation, communication, and overlap, we established a training duration simulation model.
The multi-node experiment results show that the algorithm can estimate the parallel training duration in real time with an average accuracy of 96%.
arXiv Detail & Related papers (2024-12-31T03:51:14Z) - Acceleration for Deep Reinforcement Learning using Parallel and Distributed Computing: A Survey [29.275928499337734]
Deep reinforcement learning has led to dramatic breakthroughs in the field of artificial intelligence.
As the amount of rollout experience data and the size of neural networks for deep reinforcement learning have grown, handling the training process and reducing the time consumption using parallel and distributed computing is becoming an urgent and essential desire.
We perform a broad and thorough investigation on training acceleration methodologies for deep reinforcement learning based on parallel and distributed computing.
arXiv Detail & Related papers (2024-11-08T14:55:32Z) - Offline reinforcement learning for job-shop scheduling problems [1.3927943269211593]
This paper introduces a novel offline RL method designed for optimization problems with complex constraints.
Our approach encodes actions in edge attributes and balances expected rewards with the imitation of expert solutions.
We demonstrate the effectiveness of this method on job-shop scheduling and flexible job-shop scheduling benchmarks.
arXiv Detail & Related papers (2024-10-21T07:33:42Z) - Efficient Training of Multi-task Neural Solver for Combinatorial Optimization [23.694457372640912]
We propose a general and efficient training paradigm to deliver a unified multi-task neural solver.
Our method significantly enhances overall performance, regardless of whether it is within constrained training budgets.
Our method also achieved the best results compared to single task learning and multitask learning approaches.
arXiv Detail & Related papers (2023-05-10T14:20:34Z) - Learning to Parallelize with OpenMP by Augmented Heterogeneous AST
Representation [7.750212995537728]
We propose a novel graph-based learning approach called Graph2Par that utilizes a heterogeneous augmented abstract syntax tree (Augmented-AST) representation for code.
We create an OMP_Serial dataset with 18598 parallelizable and 13972 non-parallelizable loops to train the machine learning models.
Our results show that our proposed approach achieves the accuracy of parallelizable code region detection with 85% accuracy and outperforms the state-of-the-art token-based machine learning approach.
arXiv Detail & Related papers (2023-05-09T21:57:15Z) - Stabilizing Q-learning with Linear Architectures for Provably Efficient
Learning [53.17258888552998]
This work proposes an exploration variant of the basic $Q$-learning protocol with linear function approximation.
We show that the performance of the algorithm degrades very gracefully under a novel and more permissive notion of approximation error.
arXiv Detail & Related papers (2022-06-01T23:26:51Z) - Accelerated Policy Learning with Parallel Differentiable Simulation [59.665651562534755]
We present a differentiable simulator and a new policy learning algorithm (SHAC)
Our algorithm alleviates problems with local minima through a smooth critic function.
We show substantial improvements in sample efficiency and wall-clock time over state-of-the-art RL and differentiable simulation-based algorithms.
arXiv Detail & Related papers (2022-04-14T17:46:26Z) - Parallel Training of Deep Networks with Local Updates [84.30918922367442]
Local parallelism is a framework which parallelizes training of individual layers in deep networks by replacing global backpropagation with truncated layer-wise backpropagation.
We show results in both vision and language domains across a diverse set of architectures, and find that local parallelism is particularly effective in the high-compute regime.
arXiv Detail & Related papers (2020-12-07T16:38:45Z) - Geometric Deep Reinforcement Learning for Dynamic DAG Scheduling [8.14784681248878]
In this paper, we propose a reinforcement learning approach to solve a realistic scheduling problem.
We apply it to an algorithm commonly executed in the high performance computing community, the Cholesky factorization.
Our algorithm uses graph neural networks in combination with an actor-critic algorithm (A2C) to build an adaptive representation of the problem on the fly.
arXiv Detail & Related papers (2020-11-09T10:57:21Z) - Randomized Block-Diagonal Preconditioning for Parallel Learning [0.0]
We study preconditioned gradient-based optimization methods where the preconditioning matrix has block-diagonal form.
Our main contribution is to demonstrate that the convergence of these methods can significantly be improved by a randomization technique.
arXiv Detail & Related papers (2020-06-24T10:12:36Z) - Parallelization Techniques for Verifying Neural Networks [52.917845265248744]
We introduce an algorithm based on the verification problem in an iterative manner and explore two partitioning strategies.
We also introduce a highly parallelizable pre-processing algorithm that uses the neuron activation phases to simplify the neural network verification problems.
arXiv Detail & Related papers (2020-04-17T20:21:47Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Accelerating Feedforward Computation via Parallel Nonlinear Equation
Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning.
We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both.
Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.