Related papers: Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads

Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads

URL: http://arxiv.org/abs/2007.04069v1
Date: Wed, 8 Jul 2020 12:38:03 GMT
Title: Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads
Authors: Siyu Wang, Yi Rong, Shiqing Fan, Zhen Zheng, LanSong Diao, Guoping Long, Jun Yang, Xiaoyong Liu, Wei Lin
Abstract summary: Auto-MAP is a framework for exploring distributed execution plans for workloads. It can automatically discovering fast parallelization strategies through reinforcement learning on IR level of deep learning models. Our evaluation shows that Auto-MAP can find the optimal solution in two hours, while achieving better throughput on several NLP and convolution models.
Score: 11.646744408920764
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The last decade has witnessed growth in the computational requirements for training deep neural networks. Current approaches (e.g., data/model parallelism, pipeline parallelism) parallelize training tasks onto multiple devices. However, these approaches always rely on specific deep learning frameworks and requires elaborate manual design, which make it difficult to maintain and share between different type of models. In this paper, we propose Auto-MAP, a framework for exploring distributed execution plans for DNN workloads, which can automatically discovering fast parallelization strategies through reinforcement learning on IR level of deep learning models. Efficient exploration remains a major challenge for reinforcement learning. We leverage DQN with task-specific pruning strategies to help efficiently explore the search space including optimized strategies. Our evaluation shows that Auto-MAP can find the optimal solution in two hours, while achieving better throughput on several NLP and convolution models.

Related papers

SCoTT: Wireless-Aware Path Planning with Vision Language Models and Strategic Chains-of-Thought [78.53885607559958]
A novel approach using vision language models (VLMs) is proposed for enabling path planning in complex wireless-aware environments. To this end, insights from a digital twin with real-world wireless ray tracing data are explored. Results show that SCoTT achieves very close average path gains compared to DP-WA* while at the same time yielding consistently shorter path lengths.
arXiv Detail & Related papers (2024-11-27T10:45:49Z)
Auto-Train-Once: Controller Network Guided Automatic Network Pruning from Scratch [72.26822499434446]
Auto-Train-Once (ATO) is an innovative network pruning algorithm designed to automatically reduce the computational and storage costs of DNNs. We provide a comprehensive convergence analysis as well as extensive experiments, and the results show that our approach achieves state-of-the-art performance across various model architectures.
arXiv Detail & Related papers (2024-03-21T02:33:37Z)
Improving Automatic Parallel Training via Balanced Memory Workload Optimization [36.87527680184956]
Transformer models have emerged as the leading approach for achieving state-of-the-art performance across various application domains. We present Galvatron-BMW, a novel system framework that integrates multiple parallelism prevalent dimensions and automatically identifies the most efficient hybrid parallelism strategy. Our evaluations on different Transformer models demonstrate the capabilities of Galvatron-BMW in automating distributed training under varying GPU memory constraints.
arXiv Detail & Related papers (2023-07-05T05:28:38Z)
DeepSeq: Deep Sequential Circuit Learning [10.402436619244911]
Circuit representation learning is a promising research direction in the electronic design automation (EDA) field. Existing solutions only target combinational circuits, significantly limiting their applications. We propose DeepSeq, a novel representation learning framework for sequential netlists.
arXiv Detail & Related papers (2023-02-27T09:17:35Z)
TAP: Accelerating Large-Scale DNN Training Through Tensor Automatic Parallelisation [19.009600866053923]
We present a model parallelism framework TAP that automatically searches for the best data and tensor parallel schedules. Experiments show that TAP is $20times- 160times$ faster than the state-of-the-art automatic parallelism framework.
arXiv Detail & Related papers (2023-02-01T05:22:28Z)
Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields. Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices. We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z)
DistIR: An Intermediate Representation and Simulator for Efficient Neural Network Distribution [15.086401550425125]
DistIR is a representation for distributed computation that is tailored for efficient analyses. We show how DistIR and its simulator enable fast grid searches over complex distribution spaces spanning up to 1000+ configurations.
arXiv Detail & Related papers (2021-11-09T21:32:51Z)
Elastic Architecture Search for Diverse Tasks with Different Resources [87.23061200971912]
We study a new challenging problem of efficient deployment for diverse tasks with different resources, where the resource constraint and task of interest corresponding to a group of classes are dynamically specified at testing time. Previous NAS approaches seek to design architectures for all classes simultaneously, which may not be optimal for some individual tasks. We present a novel and general framework, called Elastic Architecture Search (EAS), permitting instant specializations at runtime for diverse tasks with various resource constraints.
arXiv Detail & Related papers (2021-08-03T00:54:27Z)
AutoBERT-Zero: Evolving BERT Backbone from Scratch [94.89102524181986]
We propose an Operation-Priority Neural Architecture Search (OP-NAS) algorithm to automatically search for promising hybrid backbone architectures. We optimize both the search algorithm and evaluation of candidate models to boost the efficiency of our proposed OP-NAS. Experiments show that the searched architecture (named AutoBERT-Zero) significantly outperforms BERT and its variants of different model capacities in various downstream tasks.
arXiv Detail & Related papers (2021-07-15T16:46:01Z)
GAKP: GRU Association and Kalman Prediction for Multiple Object Tracking [8.559199703957393]
Multiple Object Tracking (MOT) has been a useful yet challenging task in many real-world applications such as video surveillance, intelligent retail, and smart city. We propose a novel tracking method that integrates the auto-tuning Kalman method for prediction and the Gated Recurrent Unit (GRU) and achieves a near-optimum with a small amount of training data.
arXiv Detail & Related papers (2020-12-28T15:52:24Z)
Scaling Distributed Deep Learning Workloads beyond the Memory Capacity with KARMA [58.040931661693925]
We propose a strategy that combines redundant recomputing and out-of-core methods. We achieve an average of 1.52x speedup in six different models over the state-of-the-art out-of-core methods. Our data parallel out-of-core solution can outperform complex hybrid model parallelism in training large models, e.g. Megatron-LM and Turning-NLG.
arXiv Detail & Related papers (2020-08-26T07:24:34Z)
Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL [53.40030379661183]
Auto-PyTorch is a framework to enable fully automated deep learning (AutoDL) It combines multi-fidelity optimization with portfolio construction for warmstarting and ensembling of deep neural networks (DNNs) We show that Auto-PyTorch performs better than several state-of-the-art competitors on average.
arXiv Detail & Related papers (2020-06-24T15:15:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.