Related papers: Embarrassingly Parallel Independent Training of Multi-Layer Perceptrons with Heterogeneous Architectures

Embarrassingly Parallel Independent Training of Multi-Layer Perceptrons with Heterogeneous Architectures

URL: http://arxiv.org/abs/2206.08369v1
Date: Tue, 14 Jun 2022 02:00:31 GMT
Title: Embarrassingly Parallel Independent Training of Multi-Layer Perceptrons with Heterogeneous Architectures
Authors: Felipe Costa Farias, Teresa Bernarda Ludermir, Carmelo Jose Albanez Bastos-Filho
Abstract summary: ParallelMLPs is a procedure to enable the training of several independent Perceptron Neural Networks with a different number of neurons and activation functions in parallel. We have assessed our algorithm in simulated datasets varying the number of samples, features and batches using 10,000 different models. We achieved a training speedup from 1 to 4 orders of magnitude if compared to the sequential approach.
Score: 2.094821665776961
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The definition of a Neural Network architecture is one of the most critical and challenging tasks to perform. In this paper, we propose ParallelMLPs. ParallelMLPs is a procedure to enable the training of several independent Multilayer Perceptron Neural Networks with a different number of neurons and activation functions in parallel by exploring the principle of locality and parallelization capabilities of modern CPUs and GPUs. The core idea of this technique is to use a Modified Matrix Multiplication that replaces an ordinal matrix multiplication by two simple matrix operations that allow separate and independent paths for gradient flowing, which can be used in other scenarios. We have assessed our algorithm in simulated datasets varying the number of samples, features and batches using 10,000 different models. We achieved a training speedup from 1 to 4 orders of magnitude if compared to the sequential approach.

Related papers

An Efficient Algorithm for Clustered Multi-Task Compressive Sensing [60.70532293880842]
Clustered multi-task compressive sensing is a hierarchical model that solves multiple compressive sensing tasks. The existing inference algorithm for this model is computationally expensive and does not scale well in high dimensions. We propose a new algorithm that substantially accelerates model inference by avoiding the need to explicitly compute these covariance matrices.
arXiv Detail & Related papers (2023-09-30T15:57:14Z)
Parallel Neural Networks in Golang [0.0]
This paper describes the design and implementation of parallel neural networks (PNNs) with the novel programming language Golang. Golang and its inherent parallelization support proved very well for parallel neural network simulation by considerable decreased processing times compared to sequential variants.
arXiv Detail & Related papers (2023-04-19T11:56:36Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Fast Differentiable Matrix Square Root and Inverse Square Root [65.67315418971688]
We propose two more efficient variants to compute the differentiable matrix square root and the inverse square root. For the forward propagation, one method is to use Matrix Taylor Polynomial (MTP), and the other method is to use Matrix Pad'e Approximants (MPA) A series of numerical tests show that both methods yield considerable speed-up compared with the SVD or the NS iteration.
arXiv Detail & Related papers (2022-01-29T10:00:35Z)
Parallel Simulation of Quantum Networks with Distributed Quantum State Management [56.24769206561207]
We identify requirements for parallel simulation of quantum networks and develop the first parallel discrete event quantum network simulator. Our contributions include the design and development of a quantum state manager that maintains shared quantum information distributed across multiple processes. We release the parallel SeQUeNCe simulator as an open-source tool alongside the existing sequential version.
arXiv Detail & Related papers (2021-11-06T16:51:17Z)
Parareal Neural Networks Emulating a Parallel-in-time Algorithm [1.988145627448243]
As deep neural networks (DNNs) become deeper, the training time increases. In this paper, we introduce a novel methodology to construct a parallel neural network.
arXiv Detail & Related papers (2021-03-16T02:03:39Z)
Restructuring, Pruning, and Adjustment of Deep Models for Parallel Distributed Inference [15.720414948573753]
We consider the parallel implementation of an already-trained deep model on multiple processing nodes (a.k.a. workers) We propose RePurpose, a layer-wise model restructuring and pruning technique that guarantees the performance of the overall parallelized model. We show that, compared to the existing methods, RePurpose significantly improves the efficiency of the distributed inference via parallel implementation.
arXiv Detail & Related papers (2020-08-19T06:44:41Z)
A Linear Algebraic Approach to Model Parallelism in Deep Learning [0.0]
Training deep neural networks (DNNs) in large-cluster computing environments is increasingly necessary, as networks grow in size and complexity. We propose a linear-algebraic approach to model parallelism in deep learning, which allows parallel distribution of any tensor in the DNN. We build distributed DNN layers using these parallel primitives, composed with sequential layer implementations, and demonstrate their application by building and training a distributed DNN using DistDL, a PyTorch and MPI-based distributed deep learning toolkit.
arXiv Detail & Related papers (2020-06-04T19:38:05Z)
Accelerating Feedforward Computation via Parallel Nonlinear Equation Solving [106.63673243937492]
Feedforward computation, such as evaluating a neural network or sampling from an autoregressive model, is ubiquitous in machine learning. We frame the task of feedforward computation as solving a system of nonlinear equations. We then propose to find the solution using a Jacobi or Gauss-Seidel fixed-point method, as well as hybrid methods of both. Our method is guaranteed to give exactly the same values as the original feedforward computation with a reduced (or equal) number of parallelizable iterations, and hence reduced time given sufficient parallel computing power.
arXiv Detail & Related papers (2020-02-10T10:11:31Z)
Unpaired Multi-modal Segmentation via Knowledge Distillation [77.39798870702174]
We propose a novel learning scheme for unpaired cross-modality image segmentation. In our method, we heavily reuse network parameters, by sharing all convolutional kernels across CT and MRI. We have extensively validated our approach on two multi-class segmentation problems.
arXiv Detail & Related papers (2020-01-06T20:03:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.