Distributed Optimization using Heterogeneous Compute Systems
- URL: http://arxiv.org/abs/2110.08941v1
- Date: Sun, 3 Oct 2021 11:21:49 GMT
- Title: Distributed Optimization using Heterogeneous Compute Systems
- Authors: Vineeth S
- Abstract summary: We consider the training of deep neural networks on a distributed system of workers with varying compute power.
A naive implementation of synchronous distributed training will result in the faster workers waiting for the slowest worker to complete processing.
We propose to dynamically adjust the data assigned for each worker during the training.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Hardware compute power has been growing at an unprecedented rate in recent
years. The utilization of such advancements plays a key role in producing
better results in less time -- both in academia and industry. However, merging
the existing hardware with the latest hardware within the same ecosystem poses
a challenging task. One of the key challenges, in this case, is varying compute
power. In this paper, we consider the training of deep neural networks on a
distributed system of workers with varying compute power. A naive
implementation of synchronous distributed training will result in the faster
workers waiting for the slowest worker to complete processing. To mitigate this
issue, we propose to dynamically adjust the data assigned for each worker
during the training. We assign each worker a partition of total data
proportional to its computing power. Our experiments show that dynamically
adjusting the data partition helps to improve the utilization of the system and
significantly reduces the time taken for training. Code is available at the
repository: \url{https://github.com/vineeths96/Heterogeneous-Systems}.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - TensorSocket: Shared Data Loading for Deep Learning Training [0.0]
Deep learning training is a repetitive and resource-intensive process.
socket enables simultaneous training processes to share the same data loader.
Our evaluation shows thatsocket enables scenarios that are infeasible without data sharing, increases training throughput by up to $100%$.
arXiv Detail & Related papers (2024-09-27T13:39:47Z) - Efficient Asynchronous Federated Learning with Sparsification and
Quantization [55.6801207905772]
Federated Learning (FL) is attracting more and more attention to collaboratively train a machine learning model without transferring raw data.
FL generally exploits a parameter server and a large number of edge devices during the whole process of the model training.
We propose TEASQ-Fed to exploit edge devices to asynchronously participate in the training process by actively applying for tasks.
arXiv Detail & Related papers (2023-12-23T07:47:07Z) - DropCompute: simple and more robust distributed synchronous training via
compute variance reduction [30.46681332866494]
We study a typical scenario in which workers are straggling due to variability in compute time.
We propose a simple yet effective decentralized method to reduce the variation among workers and thus improve the robustness of synchronous training.
arXiv Detail & Related papers (2023-06-18T16:55:31Z) - Solving Large-scale Spatial Problems with Convolutional Neural Networks [88.31876586547848]
We employ transfer learning to improve training efficiency for large-scale spatial problems.
We propose that a convolutional neural network (CNN) can be trained on small windows of signals, but evaluated on arbitrarily large signals with little to no performance degradation.
arXiv Detail & Related papers (2023-06-14T01:24:42Z) - Fast and Straggler-Tolerant Distributed SGD with Reduced Computation
Load [11.069252535469644]
optimization procedures like gradient descent (SGD) can be leveraged to mitigate the effect of unresponsive or slow workers called stragglers.
This can be done by only waiting for a subset of the workers to finish their computation at each iteration of the algorithm.
We construct a novel scheme that adapts both the number of workers and the computation load throughout the run-time of the algorithm.
arXiv Detail & Related papers (2023-04-17T20:12:18Z) - Partitioning Distributed Compute Jobs with Reinforcement Learning and
Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields.
Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices.
We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z) - Collaborative Learning over Wireless Networks: An Introductory Overview [84.09366153693361]
We will mainly focus on collaborative training across wireless devices.
Many distributed optimization algorithms have been developed over the last decades.
They provide data locality; that is, a joint model can be trained collaboratively while the data available at each participating device remains local.
arXiv Detail & Related papers (2021-12-07T20:15:39Z) - Straggler-Resilient Distributed Machine Learning with Dynamic Backup
Workers [9.919012793724628]
We propose a fully distributed algorithm to determine the number of backup workers for each worker.
Our algorithm achieves a linear speedup for convergence (i.e., convergence performance increases linearly with respect to the number of workers)
arXiv Detail & Related papers (2021-02-11T21:39:53Z) - DaSGD: Squeezing SGD Parallelization Performance in Distributed Training
Using Delayed Averaging [4.652668321425679]
Minibatch gradient descent (SGD) algorithm requires workers to halt forward/back propagations.
DaSGD parallelizes SGD and forward/back propagations to hide 100% of the communication overhead.
arXiv Detail & Related papers (2020-05-31T05:43:50Z) - Fitting the Search Space of Weight-sharing NAS with Graph Convolutional
Networks [100.14670789581811]
We train a graph convolutional network to fit the performance of sampled sub-networks.
With this strategy, we achieve a higher rank correlation coefficient in the selected set of candidates.
arXiv Detail & Related papers (2020-04-17T19:12:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.