TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with
Auto-Parallelism
- URL: http://arxiv.org/abs/2004.10856v1
- Date: Thu, 16 Apr 2020 02:57:35 GMT
- Title: TensorOpt: Exploring the Tradeoffs in Distributed DNN Training with
Auto-Parallelism
- Authors: Zhenkun Cai, Kaihao Ma, Xiao Yan, Yidi Wu, Yuzhen Huang, James Cheng,
Teng Su, Fan Yu
- Abstract summary: A good parallelization strategy can significantly improve the efficiency or reduce the cost for the distributed training of deep neural networks (DNNs)
We propose FT, an efficient algorithm that searches for an optimal set of parallelization strategies to allow the trade-off among different objectives.
We also develop a user-friendly system, calledOpt, which allows users to run their distributed DNN training jobs without caring the details of parallelization strategies.
- Score: 21.980316675614787
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A good parallelization strategy can significantly improve the efficiency or
reduce the cost for the distributed training of deep neural networks (DNNs).
Recently, several methods have been proposed to find efficient parallelization
strategies but they all optimize a single objective (e.g., execution time,
memory consumption) and produce only one strategy. We propose FT, an efficient
algorithm that searches for an optimal set of parallelization strategies to
allow the trade-off among different objectives. FT can adapt to different
scenarios by minimizing the memory consumption when the number of devices is
limited and fully utilize additional resources to reduce the execution time.
For popular DNN models (e.g., vision, language), an in-depth analysis is
conducted to understand the trade-offs among different objectives and their
influence on the parallelization strategies. We also develop a user-friendly
system, called TensorOpt, which allows users to run their distributed DNN
training jobs without caring the details of parallelization strategies.
Experimental results show that FT runs efficiently and provides accurate
estimation of runtime costs, and TensorOpt is more flexible in adapting to
resource availability compared with existing frameworks.
Related papers
- PaSE: Parallelization Strategies for Efficient DNN Training [0.09889128046943638]
Training a deep neural network (DNN) requires substantial computational and memory requirements.
Standard practice is to use data parallelism because of its simplicity.
Expert-designed strategies have been proposed on a case-by-case basis using domain specific knowledge.
arXiv Detail & Related papers (2024-07-04T15:21:20Z) - Switchable Decision: Dynamic Neural Generation Networks [98.61113699324429]
We propose a switchable decision to accelerate inference by dynamically assigning resources for each data instance.
Our method benefits from less cost during inference while keeping the same accuracy.
arXiv Detail & Related papers (2024-05-07T17:44:54Z) - FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup
for Non-IID Data [54.81695390763957]
Federated learning is an emerging distributed machine learning method.
We propose a heterogeneous local variant of AMSGrad, named FedLALR, in which each client adjusts its learning rate.
We show that our client-specified auto-tuned learning rate scheduling can converge and achieve linear speedup with respect to the number of clients.
arXiv Detail & Related papers (2023-09-18T12:35:05Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Online Network Source Optimization with Graph-Kernel MAB [62.6067511147939]
We propose Grab-UCB, a graph- kernel multi-arms bandit algorithm to learn online the optimal source placement in large scale networks.
We describe the network processes with an adaptive graph dictionary model, which typically leads to sparse spectral representations.
We derive the performance guarantees that depend on network parameters, which further influence the learning curve of the sequential decision strategy.
arXiv Detail & Related papers (2023-07-07T15:03:42Z) - Combining Multi-Objective Bayesian Optimization with Reinforcement Learning for TinyML [4.2019872499238256]
We propose a novel strategy for deploying Deep Neural Networks on microcontrollers (TinyML) based on Multi-Objective Bayesian optimization (MOBOpt)
Our methodology aims at efficiently finding tradeoffs between a DNN's predictive accuracy, memory consumption on a given target system, and computational complexity.
arXiv Detail & Related papers (2023-05-23T14:31:52Z) - Energy-efficient Task Adaptation for NLP Edge Inference Leveraging
Heterogeneous Memory Architectures [68.91874045918112]
adapter-ALBERT is an efficient model optimization for maximal data reuse across different tasks.
We demonstrate the advantage of mapping the model to a heterogeneous on-chip memory architecture by performing simulations on a validated NLP edge accelerator.
arXiv Detail & Related papers (2023-03-25T14:40:59Z) - DistIR: An Intermediate Representation and Simulator for Efficient
Neural Network Distribution [15.086401550425125]
DistIR is a representation for distributed computation that is tailored for efficient analyses.
We show how DistIR and its simulator enable fast grid searches over complex distribution spaces spanning up to 1000+ configurations.
arXiv Detail & Related papers (2021-11-09T21:32:51Z) - Deep Learning-based Resource Allocation For Device-to-Device
Communication [66.74874646973593]
We propose a framework for the optimization of the resource allocation in multi-channel cellular systems with device-to-device (D2D) communication.
A deep learning (DL) framework is proposed, where the optimal resource allocation strategy for arbitrary channel conditions is approximated by deep neural network (DNN) models.
Our simulation results confirm that near-optimal performance can be attained with low time, which underlines the real-time capability of the proposed scheme.
arXiv Detail & Related papers (2020-11-25T14:19:23Z) - DBS: Dynamic Batch Size For Distributed Deep Neural Network Training [19.766163856388694]
We propose the Dynamic Batch Size (DBS) strategy for the distributedtraining of Deep Neural Networks (DNNs)
Specifically, the performance of each worker is evaluatedfirst based on the fact in the previous epoch, and then the batch size and dataset partition are dynamically adjusted.
The experimental results indicate that the proposed strategy can fully utilizethe performance of the cluster, reduce the training time, and have good robustness with disturbance by irrelevant tasks.
arXiv Detail & Related papers (2020-07-23T07:31:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.