BFTrainer: Low-Cost Training of Neural Networks on Unfillable
Supercomputer Nodes
- URL: http://arxiv.org/abs/2106.12091v1
- Date: Tue, 22 Jun 2021 22:53:19 GMT
- Title: BFTrainer: Low-Cost Training of Neural Networks on Unfillable
Supercomputer Nodes
- Authors: Zhengchun Liu, Rajkumar Kettimuthu, Michael E. Papka, Ian Foster
- Abstract summary: FCFS-based scheduling policies result in many transient idle nodes.
We show how to realize a novel use for these otherwise wasted resources, namely, deep neural network (DNN) training.
- Score: 0.8201100713224002
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Supercomputer FCFS-based scheduling policies result in many transient idle
nodes, a phenomenon that is only partially alleviated by backfill scheduling
methods that promote small jobs to run before large jobs. Here we describe how
to realize a novel use for these otherwise wasted resources, namely, deep
neural network (DNN) training. This important workload is easily organized as
many small fragments that can be configured dynamically to fit essentially any
node*time hole in a supercomputer's schedule. We describe how the task of
rescaling suitable DNN training tasks to fit dynamically changing holes can be
formulated as a deterministic mixed integer linear programming (MILP)-based
resource allocation algorithm, and show that this MILP problem can be solved
efficiently at run time. We show further how this MILP problem can be adapted
to optimize for administrator- or user-defined metrics. We validate our method
with supercomputer scheduler logs and different DNN training scenarios, and
demonstrate efficiencies of up to 93% compared with running the same training
tasks on dedicated nodes. Our method thus enables substantial supercomputer
resources to be allocated to DNN training with no impact on other applications.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - On-Device Training of Fully Quantized Deep Neural Networks on Cortex-M Microcontrollers [4.370731001036268]
We present a method that enables efficient training of DNNs completely in place on the MCU using fully quantized training (FQT) and dynamic partial gradient updates.
We demonstrate the feasibility of our approach on multiple vision and time-series datasets and provide insights into the tradeoff between training accuracy, memory overhead, energy, and latency on real hardware.
arXiv Detail & Related papers (2024-07-15T14:01:34Z) - DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse
Multi-DNN Workloads [65.47816359465155]
Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices.
We propose Dysta, a novel scheduler that utilizes both static sparsity patterns and dynamic sparsity information for the sparse multi-DNN scheduling.
Our proposed approach outperforms the state-of-the-art methods with up to 10% decrease in latency constraint violation rate and nearly 4X reduction in average normalized turnaround time.
arXiv Detail & Related papers (2023-10-17T09:25:17Z) - A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical
Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs)
MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z) - Scalable Resource Management for Dynamic MEC: An Unsupervised
Link-Output Graph Neural Network Approach [36.32772317151467]
Deep learning has been successfully adopted in mobile edge computing (MEC) to optimize task offloading and resource allocation.
The dynamics of edge networks raise two challenges in neural network (NN)-based optimization methods: low scalability and high training costs.
In this paper, a novel link-output GNN (LOGNN)-based resource management approach is proposed to flexibly optimize the resource allocation in MEC.
arXiv Detail & Related papers (2023-06-15T08:21:41Z) - Decouple Graph Neural Networks: Train Multiple Simple GNNs Simultaneously Instead of One [60.5818387068983]
Graph neural networks (GNN) suffer from severe inefficiency.
We propose to decouple a multi-layer GNN as multiple simple modules for more efficient training.
We show that the proposed framework is highly efficient with reasonable performance.
arXiv Detail & Related papers (2023-04-20T07:21:32Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Trainable Weight Averaging: A General Approach for Subspace Training [20.58652836107849]
Training deep neural networks (DNNs) in low-dimensional subspaces is a promising direction for achieving efficient training and better performance.
We propose emphTrainable Weight Averaging (TWA), a general approach for subspace training.
TWA is efficient in terms of subspace extraction and easy to generalization.
arXiv Detail & Related papers (2022-05-26T01:54:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.