TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic
Parallelisation
- URL: http://arxiv.org/abs/2302.00247v1
- Date: Wed, 1 Feb 2023 05:22:28 GMT
- Title: TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic
Parallelisation
- Authors: Ziji Shi, Le Jiang, Ang Wang, Jie Zhang, Xianyan Jia, Yong Li, Chencan
Wu, Jialin Li, Wei Lin
- Abstract summary: We present a model parallelism framework TAP that automatically searches for the best data and tensor parallel schedules.
Experiments show that TAP is $20times- 160times$ faster than the state-of-the-art automatic parallelism framework.
- Score: 19.009600866053923
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model parallelism has become necessary to train large neural networks.
However, finding a suitable model parallel schedule for an arbitrary neural
network is a non-trivial task due to the exploding search space. In this work,
we present a model parallelism framework TAP that automatically searches for
the best data and tensor parallel schedules. Leveraging the key insight that a
neural network can be represented as a directed acyclic graph, within which may
only exist a limited set of frequent subgraphs, we design a graph pruning
algorithm to fold the search space efficiently. TAP runs at sub-linear
complexity concerning the neural network size. Experiments show that TAP is
$20\times- 160\times$ faster than the state-of-the-art automatic parallelism
framework, and the performance of its discovered schedules is competitive with
the expert-engineered ones.
Related papers
- Testing RadiX-Nets: Advances in Viable Sparse Topologies [0.9555447998395205]
Sparsification of hyper-parametrized deep neural networks (DNNs) creates simpler representations of complex data.
RadiX-Nets, a subgroup of DNNs, maintain runtime which counteracts their lack of neural connections.
This paper presents a testing suite for RadiX-Nets in scalable models.
arXiv Detail & Related papers (2023-11-06T23:27:28Z) - How neural networks learn to classify chaotic time series [77.34726150561087]
We study the inner workings of neural networks trained to classify regular-versus-chaotic time series.
We find that the relation between input periodicity and activation periodicity is key for the performance of LKCNN models.
arXiv Detail & Related papers (2023-06-04T08:53:27Z) - HARL: Hierarchical Adaptive Reinforcement Learning Based Auto Scheduler
for Neural Networks [51.71682428015139]
We propose HARL, a reinforcement learning-based auto-scheduler for efficient tensor program exploration.
HarL improves the tensor operator performance by 22% and the search speed by 4.3x compared to the state-of-the-art auto-scheduler.
Inference performance and search speed are also significantly improved on end-to-end neural networks.
arXiv Detail & Related papers (2022-11-21T04:15:27Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - DistIR: An Intermediate Representation and Simulator for Efficient
Neural Network Distribution [15.086401550425125]
DistIR is a representation for distributed computation that is tailored for efficient analyses.
We show how DistIR and its simulator enable fast grid searches over complex distribution spaces spanning up to 1000+ configurations.
arXiv Detail & Related papers (2021-11-09T21:32:51Z) - A quantum algorithm for training wide and deep classical neural networks [72.2614468437919]
We show that conditions amenable to classical trainability via gradient descent coincide with those necessary for efficiently solving quantum linear systems.
We numerically demonstrate that the MNIST image dataset satisfies such conditions.
We provide empirical evidence for $O(log n)$ training of a convolutional neural network with pooling.
arXiv Detail & Related papers (2021-07-19T23:41:03Z) - Mitigating Performance Saturation in Neural Marked Point Processes:
Architectures and Loss Functions [50.674773358075015]
We propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers.
We show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.
arXiv Detail & Related papers (2021-07-07T16:59:14Z) - Parareal Neural Networks Emulating a Parallel-in-time Algorithm [1.988145627448243]
As deep neural networks (DNNs) become deeper, the training time increases.
In this paper, we introduce a novel methodology to construct a parallel neural network.
arXiv Detail & Related papers (2021-03-16T02:03:39Z) - Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for
DNN Workloads [11.646744408920764]
Auto-MAP is a framework for exploring distributed execution plans for workloads.
It can automatically discovering fast parallelization strategies through reinforcement learning on IR level of deep learning models.
Our evaluation shows that Auto-MAP can find the optimal solution in two hours, while achieving better throughput on several NLP and convolution models.
arXiv Detail & Related papers (2020-07-08T12:38:03Z) - A Linear Algebraic Approach to Model Parallelism in Deep Learning [0.0]
Training deep neural networks (DNNs) in large-cluster computing environments is increasingly necessary, as networks grow in size and complexity.
We propose a linear-algebraic approach to model parallelism in deep learning, which allows parallel distribution of any tensor in the DNN.
We build distributed DNN layers using these parallel primitives, composed with sequential layer implementations, and demonstrate their application by building and training a distributed DNN using DistDL, a PyTorch and MPI-based distributed deep learning toolkit.
arXiv Detail & Related papers (2020-06-04T19:38:05Z) - Large-Scale Gradient-Free Deep Learning with Recursive Local
Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources.
Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize.
We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.