Related papers: BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes

BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes

URL: http://arxiv.org/abs/2106.12091v1
Date: Tue, 22 Jun 2021 22:53:19 GMT
Title: BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes
Authors: Zhengchun Liu, Rajkumar Kettimuthu, Michael E. Papka, Ian Foster
Abstract summary: FCFS-based scheduling policies result in many transient idle nodes. We show how to realize a novel use for these otherwise wasted resources, namely, deep neural network (DNN) training.
Score: 0.8201100713224002
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Supercomputer FCFS-based scheduling policies result in many transient idle nodes, a phenomenon that is only partially alleviated by backfill scheduling methods that promote small jobs to run before large jobs. Here we describe how to realize a novel use for these otherwise wasted resources, namely, deep neural network (DNN) training. This important workload is easily organized as many small fragments that can be configured dynamically to fit essentially any node*time hole in a supercomputer's schedule. We describe how the task of rescaling suitable DNN training tasks to fit dynamically changing holes can be formulated as a deterministic mixed integer linear programming (MILP)-based resource allocation algorithm, and show that this MILP problem can be solved efficiently at run time. We show further how this MILP problem can be adapted to optimize for administrator- or user-defined metrics. We validate our method with supercomputer scheduler logs and different DNN training scenarios, and demonstrate efficiencies of up to 93% compared with running the same training tasks on dedicated nodes. Our method thus enables substantial supercomputer resources to be allocated to DNN training with no impact on other applications.

Related papers

Optimal Gradient Checkpointing for Sparse and Recurrent Architectures using Off-Chip Memory [0.8321953606016751]
We introduce memory-efficient gradient checkpointing strategies tailored for the general class of sparse RNNs and Spiking Neural Networks. We find that Double Checkpointing emerges as the most effective method, optimizing the use of local memory resources while minimizing recomputation overhead.
arXiv Detail & Related papers (2024-12-16T14:23:31Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M Microcontrollers [4.370731001036268]
We present a method that enables efficient training of DNNs completely in place on the MCU using fully quantized training (FQT) and dynamic partial gradient updates. We demonstrate the feasibility of our approach on multiple vision and time-series datasets and provide insights into the tradeoff between training accuracy, memory overhead, energy, and latency on real hardware.
arXiv Detail & Related papers (2024-07-15T14:01:34Z)
DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing. Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time. We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z)
Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices. We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling. Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Scalable Resource Management for Dynamic MEC: An Unsupervised Link-Output Graph Neural Network Approach [36.32772317151467]
Deep learning has been successfully adopted in mobile edge computing (MEC) to optimize task offloading and resource allocation. The dynamics of edge networks raise two challenges in neural network (NN)-based optimization methods: low scalability and high training costs. In this paper, a novel link-output GNN (LOGNN)-based resource management approach is proposed to flexibly optimize the resource allocation in MEC.
arXiv Detail & Related papers (2023-06-15T08:21:41Z)
Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency. We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training. We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z)
Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency. We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z)
Trainable Weight Averaging: A General Approach for Subspace Training [20.58652836107849]
Training deep neural networks (DNNs) in low-dimensional subspaces is a promising direction for achieving efficient training and better performance. We propose emphTrainable Weight Averaging (TWA), a general approach for subspace training. TWA is efficient in terms of subspace extraction and easy to generalization.
arXiv Detail & Related papers (2022-05-26T01:54:48Z)

This list is automatically generated from the titles and abstracts of the papers in this site.