Related papers: Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters

Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters

URL: http://arxiv.org/abs/2111.10672v1
Date: Sat, 20 Nov 2021 20:34:26 GMT
Title: Doing More by Doing Less: How Structured Partial Backpropagation Improves Deep Learning Clusters
Authors: Adarsh Kumar, Kausik Subramanian, Shivaram Venkataraman, Aditya Akella
Abstract summary: Training deep learning models is resource-intensive, consuming significant compute, memory, and network resources. We propose Structured Partial Backpropagation(SPB), a technique that controls the amount of backpropagation at individual workers in distributed training. We find that JigSaw can improve large scale cluster efficiency by as high as 28%.
Score: 9.17259958324486
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many organizations employ compute clusters equipped with accelerators such as GPUs and TPUs for training deep learning models in a distributed fashion. Training is resource-intensive, consuming significant compute, memory, and network resources. Many prior works explore how to reduce training resource footprint without impacting quality, but their focus on a subset of the bottlenecks (typically only the network) limits their ability to improve overall cluster utilization. In this work, we exploit the unique characteristics of deep learning workloads to propose Structured Partial Backpropagation(SPB), a technique that systematically controls the amount of backpropagation at individual workers in distributed training. This simultaneously reduces network bandwidth, compute utilization, and memory footprint while preserving model quality. To efficiently leverage the benefits of SPB at cluster level, we introduce JigSaw, a SPB aware scheduler, which does scheduling at the iteration level for Deep Learning Training(DLT) jobs. We find that JigSaw can improve large scale cluster efficiency by as high as 28\%.

Related papers

A mini-batch training strategy for deep subspace clustering networks [6.517972913340111]
Mini-batch training is a cornerstone of modern deep learning, offering computational efficiency and scalability for training complex architectures.<n>In this work, we introduce a mini-batch training strategy for deep subspace clustering by integrating a memory bank that preserves global feature representations.<n>Our approach enables scalable training of deep architectures for subspace clustering with high-resolution images, overcoming previous limitations.
arXiv Detail & Related papers (2025-07-26T11:44:39Z)
Prior-Fitted Networks Scale to Larger Datasets When Treated as Weak Learners [82.72552644267724]
BoostPFN can outperform standard PFNs with the same size of training samples in large datasets. High performance is maintained for up to 50x of the pre-training size of PFNs.
arXiv Detail & Related papers (2025-03-03T07:31:40Z)
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators [59.11160990637615]
We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.
arXiv Detail & Related papers (2023-05-24T16:08:55Z)
Efficient Parallel Split Learning over Resource-constrained Wireless Edge Networks [44.37047471448793]
In this paper, we advocate the integration of edge computing paradigm and parallel split learning (PSL) We propose an innovative PSL framework, namely, efficient parallel split learning (EPSL) to accelerate model training. We show that the proposed EPSL framework significantly decreases the training latency needed to achieve a target accuracy.
arXiv Detail & Related papers (2023-03-26T16:09:48Z)
Partitioning Distributed Compute Jobs with Reinforcement Learning and Graph Neural Networks [58.720142291102135]
Large-scale machine learning models are bringing advances to a broad range of fields. Many of these models are too large to be trained on a single machine, and must be distributed across multiple devices. We show that maximum parallelisation is sub-optimal in relation to user-critical metrics such as throughput and blocking rate.
arXiv Detail & Related papers (2023-01-31T17:41:07Z)
Unifying Synergies between Self-supervised Learning and Dynamic Computation [53.66628188936682]
We present a novel perspective on the interplay between SSL and DC paradigms. We show that it is feasible to simultaneously learn a dense and gated sub-network from scratch in a SSL setting. The co-evolution during pre-training of both dense and gated encoder offers a good accuracy-efficiency trade-off.
arXiv Detail & Related papers (2023-01-22T17:12:58Z)
Privacy-Preserving Serverless Edge Learning with Decentralized Small Data [13.254530176359182]
Distributed training strategies have recently become a promising approach to ensure data privacy when training deep models. This paper extends conventional serverless platforms with serverless edge learning architectures and provides an efficient distributed training framework from the networking perspective.
arXiv Detail & Related papers (2021-11-29T21:04:49Z)
HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments [37.55572042288321]
Training process of neural networks (DNNs) generally handles large-scale input data with many sparse features. Paddle-HeterPS is composed of a distributed architecture and a Reinforcement Reinforcement (RL)-based scheduling method. We show that Paddle-HeterPS significantly outperforms state-of-the-art approaches in terms of throughput (14.5 times higher) and monetary cost (312.3% smaller)
arXiv Detail & Related papers (2021-11-20T17:09:15Z)
Dynamic Sparse Training for Deep Reinforcement Learning [36.66889208433228]
We propose for the first time to dynamically train deep reinforcement learning agents with sparse neural networks from scratch. Our approach is easy to be integrated into existing deep reinforcement learning algorithms. We evaluate our approach on OpenAI gym continuous control tasks.
arXiv Detail & Related papers (2021-06-08T09:57:20Z)
Distributed Training of Deep Learning Models: A Taxonomic Perspective [11.924058430461216]
Distributed deep learning systems (DDLS) train deep neural network models by utilizing the distributed resources of a cluster. We aim to shine some light on the fundamental principles that are at work when training deep neural networks in a cluster of independent machines.
arXiv Detail & Related papers (2020-07-08T08:56:58Z)
Rapid Structural Pruning of Neural Networks with Set-based Task-Adaptive Meta-Pruning [83.59005356327103]
A common limitation of most existing pruning techniques is that they require pre-training of the network at least once before pruning. We propose STAMP, which task-adaptively prunes a network pretrained on a large reference dataset by generating a pruning mask on it as a function of the target dataset. We validate STAMP against recent advanced pruning methods on benchmark datasets.
arXiv Detail & Related papers (2020-06-22T10:57:43Z)
Joint Parameter-and-Bandwidth Allocation for Improving the Efficiency of Partitioned Edge Learning [73.82875010696849]
Machine learning algorithms are deployed at the network edge for training artificial intelligence (AI) models. This paper focuses on the novel joint design of parameter (computation load) allocation and bandwidth allocation.
arXiv Detail & Related papers (2020-03-10T05:52:15Z)
Large-Scale Gradient-Free Deep Learning with Recursive Local Representation Alignment [84.57874289554839]
Training deep neural networks on large-scale datasets requires significant hardware resources. Backpropagation, the workhorse for training these networks, is an inherently sequential process that is difficult to parallelize. We propose a neuro-biologically-plausible alternative to backprop that can be used to train deep networks.
arXiv Detail & Related papers (2020-02-10T16:20:02Z)

This list is automatically generated from the titles and abstracts of the papers in this site.