Benchmarking network fabrics for data distributed training of deep
neural networks
- URL: http://arxiv.org/abs/2008.08057v1
- Date: Tue, 18 Aug 2020 17:38:30 GMT
- Title: Benchmarking network fabrics for data distributed training of deep
neural networks
- Authors: Siddharth Samsi, Andrew Prout, Michael Jones, Andrew Kirby, Bill
Arcand, Bill Bergeron, David Bestor, Chansup Byun, Vijay Gadepally, Michael
Houle, Matthew Hubbell, Anna Klein, Peter Michaleas, Lauren Milechin, Julie
Mullen, Antonio Rosa, Charles Yee, Albert Reuther, Jeremy Kepner
- Abstract summary: Large computational requirements for training deep models have necessitated the development of new methods for faster training.
One such approach is the data parallel approach, where the training data is distributed across multiple compute nodes.
In this paper, we examine the effects of using different physical hardware interconnects and network-related software primitives for enabling data distributed deep learning.
- Score: 10.067102343753643
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Artificial Intelligence/Machine Learning applications require the training of
complex models on large amounts of labelled data. The large computational
requirements for training deep models have necessitated the development of new
methods for faster training. One such approach is the data parallel approach,
where the training data is distributed across multiple compute nodes. This
approach is simple to implement and supported by most of the commonly used
machine learning frameworks. The data parallel approach leverages MPI for
communicating gradients across all nodes. In this paper, we examine the effects
of using different physical hardware interconnects and network-related software
primitives for enabling data distributed deep learning. We compare the effect
of using GPUDirect and NCCL on Ethernet and OmniPath fabrics. Our results show
that using Ethernet-based networking in shared HPC systems does not have a
significant effect on the training times for commonly used deep neural network
architectures or traditional HPC applications such as Computational Fluid
Dynamics.
Related papers
- Partitioned Neural Network Training via Synthetic Intermediate Labels [0.0]
GPU memory constraints have become a notable bottleneck in training such sizable models.
This study advocates partitioning the model across GPU and generating synthetic intermediate labels to train individual segments.
This approach results in a more efficient training process that minimizes data communication while maintaining model accuracy.
arXiv Detail & Related papers (2024-03-17T13:06:29Z) - Training Deep Surrogate Models with Large Scale Online Learning [48.7576911714538]
Deep learning algorithms have emerged as a viable alternative for obtaining fast solutions for PDEs.
Models are usually trained on synthetic data generated by solvers, stored on disk and read back for training.
It proposes an open source online training framework for deep surrogate models.
arXiv Detail & Related papers (2023-06-28T12:02:27Z) - Performance and Energy Consumption of Parallel Machine Learning
Algorithms [0.0]
Machine learning models have achieved remarkable success in various real-world applications.
Model training in machine learning requires large-scale data sets and multiple iterations before it can work properly.
Parallelization of training algorithms is a common strategy to speed up the process of training.
arXiv Detail & Related papers (2023-05-01T13:04:39Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Parallel Machine Learning for Forecasting the Dynamics of Complex
Networks [0.0]
We present a machine learning scheme for forecasting the dynamics of large complex networks.
We use a parallel architecture that mimics the topology of the network of interest.
arXiv Detail & Related papers (2021-08-27T06:06:41Z) - ItNet: iterative neural networks with small graphs for accurate and
efficient anytime prediction [1.52292571922932]
In this study, we introduce a class of network models that have a small memory footprint in terms of their computational graphs.
We show state-of-the-art results for semantic segmentation on the CamVid and Cityscapes datasets.
arXiv Detail & Related papers (2021-01-21T15:56:29Z) - Dynamic Graph: Learning Instance-aware Connectivity for Neural Networks [78.65792427542672]
Dynamic Graph Network (DG-Net) is a complete directed acyclic graph, where the nodes represent convolutional blocks and the edges represent connection paths.
Instead of using the same path of the network, DG-Net aggregates features dynamically in each node, which allows the network to have more representation ability.
arXiv Detail & Related papers (2020-10-02T16:50:26Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.