Accelerating GAN training using highly parallel hardware on public cloud
- URL: http://arxiv.org/abs/2111.04628v1
- Date: Mon, 8 Nov 2021 16:59:15 GMT
- Title: Accelerating GAN training using highly parallel hardware on public cloud
- Authors: Renato Cardoso, Dejan Golubovic, Ignacio Peluaga Lozada, Ricardo
Rocha, Jo\~ao Fernandes and Sofia Vallecorsa
- Abstract summary: This work explores different types of cloud services to train a Geneversarative Adversarial Network (GAN) in a parallel environment.
We parallelize the training process on multiple GPU and Google Processing Units (TPU)
Linear speed-up of the training process is obtained, while retaining most of the performance in terms of physics results.
- Score: 0.3694429692322631
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the increasing number of Machine and Deep Learning applications in High
Energy Physics, easy access to dedicated infrastructure represents a
requirement for fast and efficient R&D. This work explores different types of
cloud services to train a Generative Adversarial Network (GAN) in a parallel
environment, using Tensorflow data parallel strategy. More specifically, we
parallelize the training process on multiple GPUs and Google Tensor Processing
Units (TPU) and we compare two algorithms: the TensorFlow built-in logic and a
custom loop, optimised to have higher control of the elements assigned to each
GPU worker or TPU core. The quality of the generated data is compared to Monte
Carlo simulation. Linear speed-up of the training process is obtained, while
retaining most of the performance in terms of physics results. Additionally, we
benchmark the aforementioned approaches, at scale, over multiple GPU nodes,
deploying the training process on different public cloud providers, seeking for
overall efficiency and cost-effectiveness. The combination of data science,
cloud deployment options and associated economics allows to burst out
heterogeneously, exploring the full potential of cloud-based services.
Related papers
- Faster Multi-GPU Training with PPLL: A Pipeline Parallelism Framework Leveraging Local Learning [8.628231789161577]
We present PPLL (Pipeline Parallelism based on Local Learning), a novel framework that leverages local learning algorithms to enable effective parallel training across multiple GPU.
By utilizing queues to manage data transfers between GPU, PPLL ensures seamless cross- GPU communication, allowing multiple blocks to execute forward and backward passes in a pipelined manner.
Our results demonstrate that PPLL significantly enhances the training speed of the local learning method while achieving comparable or even superior training speed to traditional pipeline parallelism.
arXiv Detail & Related papers (2024-11-19T08:09:18Z) - Benchmarking Edge AI Platforms for High-Performance ML Inference [0.0]
Edge computing's growing prominence, due to its ability to reduce communication latency and enable real-time processing, is promoting the rise of high-performance, heterogeneous System-on-Chip solutions.
While current approaches often involve scaling down modern hardware, the performance characteristics of neural network workloads can vary significantly.
We compare the latency and throughput of various linear algebra and neural network inference tasks across CPU-only, CPU/GPU, and CPU/NPU integrated solutions.
arXiv Detail & Related papers (2024-09-23T08:27:27Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - In Situ Framework for Coupling Simulation and Machine Learning with
Application to CFD [51.04126395480625]
Recent years have seen many successful applications of machine learning (ML) to facilitate fluid dynamic computations.
As simulations grow, generating new training datasets for traditional offline learning creates I/O and storage bottlenecks.
This work offers a solution by simplifying this coupling and enabling in situ training and inference on heterogeneous clusters.
arXiv Detail & Related papers (2023-06-22T14:07:54Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - PARTIME: Scalable and Parallel Processing Over Time with Deep Neural
Networks [68.96484488899901]
We present PARTIME, a library designed to speed up neural networks whenever data is continuously streamed over time.
PARTIME starts processing each data sample at the time in which it becomes available from the stream.
Experiments are performed in order to empirically compare PARTIME with classic non-parallel neural computations in online learning.
arXiv Detail & Related papers (2022-10-17T14:49:14Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - Accelerating Training and Inference of Graph Neural Networks with Fast
Sampling and Pipelining [58.10436813430554]
Mini-batch training of graph neural networks (GNNs) requires a lot of computation and data movement.
We argue in favor of performing mini-batch training with neighborhood sampling in a distributed multi-GPU environment.
We present a sequence of improvements to mitigate these bottlenecks, including a performance-engineered neighborhood sampler.
We also conduct an empirical analysis that supports the use of sampling for inference, showing that test accuracies are not materially compromised.
arXiv Detail & Related papers (2021-10-16T02:41:35Z) - Scheduling Optimization Techniques for Neural Network Training [3.1617796705744547]
This paper proposes out-of-order (ooo) backprop, an effective scheduling technique for neural network training.
We show that the GPU utilization in single-GPU, data-parallel, and pipeline-parallel training can be commonly improve by applying ooo backprop.
arXiv Detail & Related papers (2021-10-03T05:45:06Z) - SparsePipe: Parallel Deep Learning for 3D Point Clouds [7.181267620981419]
SparsePipe is built to support 3D sparse data such as point clouds.
It exploits intra-batch parallelism that partitions input data into multiple processors.
We show that SparsePipe can parallelize effectively and obtain better performance on current point cloud benchmarks.
arXiv Detail & Related papers (2020-12-27T01:47:09Z) - Benchmarking network fabrics for data distributed training of deep
neural networks [10.067102343753643]
Large computational requirements for training deep models have necessitated the development of new methods for faster training.
One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes.
In this paper, we examine the effects of using different physical hardware interconnects and network-related software primitives for enabling data distributed deep learning.
arXiv Detail & Related papers (2020-08-18T17:38:30Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.