Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model
Training
- URL: http://arxiv.org/abs/2302.05045v3
- Date: Sun, 14 May 2023 04:14:41 GMT
- Title: Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model
Training
- Authors: Siddharth Singh, Abhinav Bhatele
- Abstract summary: We propose a novel approach that exploits sparseworks to optimize the memory utilization and communication in two popular algorithms for parallel deep learning.
We integrate our approach into AxoNN, a highly scalable framework for parallel deep learning, and demonstrate the reduction in communication time and memory utilization.
- Score: 1.5301777464637454
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parallel training of neural networks at scale is challenging due to
significant overheads arising from communication. Recently, deep learning
researchers have developed a variety of pruning algorithms that are capable of
pruning (i.e. setting to zero) 80-90% of the parameters in a neural network to
yield sparse subnetworks that equal the accuracy of the unpruned parent
network. In this work, we propose a novel approach that exploits these sparse
subnetworks to optimize the memory utilization and communication in two popular
algorithms for parallel deep learning namely -- data and inter-layer
parallelism. We integrate our approach into AxoNN, a highly scalable framework
for parallel deep learning that relies on data and inter-layer parallelism, and
demonstrate the reduction in communication time and memory utilization. On 512
NVIDIA V100 GPUs, our optimizations reduce the memory consumption of a 2.7
billion parameter model by 74%, and the total communication time by 40%, thus
providing an overall speedup of 34% over AxoNN, 32% over DeepSpeed-3D and 46%
over Sputnik, a sparse matrix computation baseline.
Related papers
- Hyperdimensional Computing Empowered Federated Foundation Model over Wireless Networks for Metaverse [56.384390765357004]
We propose an integrated federated split learning and hyperdimensional computing framework for emerging foundation models.
This novel approach reduces communication costs, computation load, and privacy risks, making it suitable for resource-constrained edge devices in the Metaverse.
arXiv Detail & Related papers (2024-08-26T17:03:14Z) - YFlows: Systematic Dataflow Exploration and Code Generation for
Efficient Neural Network Inference using SIMD Architectures on CPUs [3.1445034800095413]
We address the challenges associated with deploying neural networks on CPUs.
Our novel approach is to use the dataflow of a neural network to explore data reuse opportunities.
Our results show that the dataflow that keeps outputs in SIMD registers consistently yields the best performance.
arXiv Detail & Related papers (2023-10-01T05:11:54Z) - A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs [1.7481226034111275]
This paper introduces a four-dimensional (4D) approach to optimize communication in parallel training.
AxoNN surpasses Megatron-LM, a state-of-the-art framework, by a significant 26%.
It achieves a significantly high 57% of the theoretical peak FLOP/s or 182 PFLOP/s in total.
arXiv Detail & Related papers (2023-05-22T22:41:49Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of
Convolutional Neural Networks [0.3653697742557465]
We analyze the compute, communication, and memory requirements of Convolutional Neural Networks (CNNs)
We leverage our model-driven analysis to be the basis for an oracle utility which can help in detecting the limitations and bottlenecks of different parallelism approaches at scale.
arXiv Detail & Related papers (2021-04-19T06:45:51Z) - Accelerating Neural Network Training with Distributed Asynchronous and
Selective Optimization (DASO) [0.0]
We introduce the Distributed Asynchronous and Selective Optimization (DASO) method to accelerate network training.
DASO uses a hierarchical and asynchronous communication scheme comprised of node-local and global networks.
We show that DASO yields a reduction in training time of up to 34% on classical and state-of-the-art networks.
arXiv Detail & Related papers (2021-04-12T16:02:20Z) - Optimizing Memory Placement using Evolutionary Graph Reinforcement
Learning [56.83172249278467]
We introduce Evolutionary Graph Reinforcement Learning (EGRL), a method designed for large search spaces.
We train and validate our approach directly on the Intel NNP-I chip for inference.
We additionally achieve 28-78% speed-up compared to the native NNP-I compiler on all three workloads.
arXiv Detail & Related papers (2020-07-14T18:50:12Z) - Communication-Efficient Distributed Stochastic AUC Maximization with
Deep Neural Networks [50.42141893913188]
We study a distributed variable for large-scale AUC for a neural network as with a deep neural network.
Our model requires a much less number of communication rounds and still a number of communication rounds in theory.
Our experiments on several datasets show the effectiveness of our theory and also confirm our theory.
arXiv Detail & Related papers (2020-05-05T18:08:23Z) - Understanding the Effects of Data Parallelism and Sparsity on Neural
Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity.
Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z) - Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of
Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models.
This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z) - PairNets: Novel Fast Shallow Artificial Neural Networks on Partitioned
Subspaces [0.0]
We create a novel shallow 4-layer ANN called "Pairwise Neural Network" ("PairNet")
A value of each input is partitioned into multiple intervals, and then an n-dimensional space is partitioned into M n-dimensional subspaces.
M local PairNets are built in M partitioned local n-dimensional subspaces.
arXiv Detail & Related papers (2020-01-24T05:23:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.