Related papers: HyperTune: Dynamic Hyperparameter Tuning For Efficient Distribution of DNN Training Over Heterogeneous Systems

HyperTune: Dynamic Hyperparameter Tuning For Efficient Distribution of DNN Training Over Heterogeneous Systems

URL: http://arxiv.org/abs/2007.08077v1
Date: Thu, 16 Jul 2020 02:12:44 GMT
Title: HyperTune: Dynamic Hyperparameter Tuning For Efficient Distribution of DNN Training Over Heterogeneous Systems
Authors: Ali HeydariGorji, Siavash Rezaei, Mahdi Torabzadehkashi, Hossein Bobarshad, Vladimir Alves, Pai H. Chou
Abstract summary: This paper describes distributed training of Deep Neural Networks (DNN) on computational storage devices (CSD) A CSD-based distributed architecture incorporates the advantages of federated learning in terms of performance scalability, resiliency, and data privacy. The paper also describes Stannis, a DNN training framework that improves on the shortcomings of existing distributed training frameworks.
Score: 1.4680035572775532
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Distributed training is a novel approach to accelerate Deep Neural Networks (DNN) training, but common training libraries fall short of addressing the distributed cases with heterogeneous processors or the cases where the processing nodes get interrupted by other workloads. This paper describes distributed training of DNN on computational storage devices (CSD), which are NAND flash-based, high capacity data storage with internal processing engines. A CSD-based distributed architecture incorporates the advantages of federated learning in terms of performance scalability, resiliency, and data privacy by eliminating the unnecessary data movement between the storage device and the host processor. The paper also describes Stannis, a DNN training framework that improves on the shortcomings of existing distributed training frameworks by dynamically tuning the training hyperparameters in heterogeneous systems to maintain the maximum overall processing speed in term of processed images per second and energy efficiency. Experimental results on image classification training benchmarks show up to 3.1x improvement in performance and 2.45x reduction in energy consumption when using Stannis plus CSD compare to the generic systems.

Related papers

FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Simplifying Distributed Neural Network Training on Massive Graphs: Randomized Partitions Improve Model Aggregation [23.018715954992352]
We present a simplified framework for distributed GNN training that does not rely on the aforementioned costly operations. Specifically, our framework assembles independent trainers, each of which asynchronously learns a local model on locally-available parts of the training graph. In experiments on social and e-commerce networks with up to 1.3 billion edges, our proposed RandomTMA and SuperTMA approaches achieve state-of-the-art performance and 2.31x speedup compared to the fastest baseline.
arXiv Detail & Related papers (2023-05-17T01:49:44Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
A Low-Complexity Approach to Rate-Distortion Optimized Variable Bit-Rate Compression for Split DNN Computing [5.3221129103999125]
Split computing has emerged as a recent paradigm for implementation of DNN-based AI workloads. We present an approach that addresses the challenge of optimizing the rate-accuracy-complexity trade-off. Our approach is remarkably lightweight, both during training and inference, highly effective and achieves excellent rate-distortion performance.
arXiv Detail & Related papers (2022-08-24T15:02:11Z)
dPRO: A Generic Profiling and Optimization System for Expediting Distributed DNN Training [12.413533491501548]
This paper proposes dPRO, a tool to identify performance bottlenecks in distributed training systems. We implement dPRO on multiple deep learning frameworks (PyTorch, MXNet, AllReduce and Server architecture) and representative communication schemes. Extensive experiments show that dPRO predicts performance of distributed training in various settings with5% errors in most cases and finds optimization strategies with up to87.1%-up over the baselines.
arXiv Detail & Related papers (2022-05-05T07:15:25Z)
FTPipeHD: A Fault-Tolerant Pipeline-Parallel Distributed Training Framework for Heterogeneous Edge Devices [21.513786638743234]
FTPipeHD is a novel framework that trains deep learning models across heterogeneous devices. It is shown that FTPipeHD is 6.8x faster in training than the state of the art method when the computing capacity of the best device is 10x greater than the worst one.
arXiv Detail & Related papers (2021-10-06T14:00:22Z)
Distribution-sensitive Information Retention for Accurate Binary Neural Network [49.971345958676196]
We present a novel Distribution-sensitive Information Retention Network (DIR-Net) to retain the information of the forward activations and backward gradients. Our DIR-Net consistently outperforms the SOTA binarization approaches under mainstream and compact architectures. We conduct our DIR-Net on real-world resource-limited devices which achieves 11.1 times storage saving and 5.4 times speedup.
arXiv Detail & Related papers (2021-09-25T10:59:39Z)
SmartDeal: Re-Modeling Deep Network Weights for Efficient Inference and Training [82.35376405568975]
Deep neural networks (DNNs) come with heavy parameterization, leading to external dynamic random-access memory (DRAM) for storage. We present SmartDeal (SD), an algorithm framework to trade higher-cost memory storage/access for lower-cost computation. We show that SD leads to 10.56x and 4.48x reduction in the storage and training energy, with negligible accuracy loss compared to state-of-the-art training baselines.
arXiv Detail & Related papers (2021-01-04T18:54:07Z)
Dynamic Hard Pruning of Neural Networks at the Edge of the Internet [11.605253906375424]
Dynamic Hard Pruning (DynHP) technique incrementally prunes the network during training. DynHP enables a tunable size reduction of the final neural network and reduces the NN memory occupancy during training. Freed memory is reused by a emphdynamic batch sizing approach to counterbalance the accuracy degradation caused by the hard pruning strategy.
arXiv Detail & Related papers (2020-11-17T10:23:28Z)
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training [126.49572353148262]
We study two factors in neural network training: data parallelism and sparsity. Despite their promising benefits, understanding of their effects on neural network training remains elusive.
arXiv Detail & Related papers (2020-03-25T10:49:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.