GCNScheduler: Scheduling Distributed Computing Applications using Graph
Convolutional Networks
- URL: http://arxiv.org/abs/2110.11552v1
- Date: Fri, 22 Oct 2021 01:54:10 GMT
- Title: GCNScheduler: Scheduling Distributed Computing Applications using Graph
Convolutional Networks
- Authors: Mehrdad Kiamari and Bhaskar Krishnamachari
- Abstract summary: We propose a graph convolutional network-based scheduler (GCNScheduler)
By carefully integrating an inter-task data dependency structure with network settings into an input graph, the GCNScheduler can efficiently schedule tasks for a given objective.
We show that it better makespan than the classic HEFT algorithm, and almost the same throughput as throughput-oriented HEFT.
- Score: 12.284934135116515
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider the classical problem of scheduling task graphs corresponding to
complex applications on distributed computing systems. A number of heuristics
have been previously proposed to optimize task scheduling with respect to
metrics such as makespan and throughput. However, they tend to be slow to run,
particularly for larger problem instances, limiting their applicability in more
dynamic systems. Motivated by the goal of solving these problems more rapidly,
we propose, for the first time, a graph convolutional network-based scheduler
(GCNScheduler). By carefully integrating an inter-task data dependency
structure with network settings into an input graph and feeding it to an
appropriate GCN, the GCNScheduler can efficiently schedule tasks of complex
applications for a given objective. We evaluate our scheme with baselines
through simulations. We show that not only can our scheme quickly and
efficiently learn from existing scheduling schemes, but also it can easily be
applied to large-scale settings where current scheduling schemes fail to
handle. We show that it achieves better makespan than the classic HEFT
algorithm, and almost the same throughput as throughput-oriented HEFT
(TP-HEFT), while providing several orders of magnitude faster scheduling times
in both cases. For example, for makespan minimization, GCNScheduler schedules
50-node task graphs in about 4 milliseconds while HEFT takes more than 1500
seconds; and for throughput maximization, GCNScheduler schedules 100-node task
graphs in about 3.3 milliseconds, compared to about 6.9 seconds for TP-HEFT.
Related papers
- FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency.
We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs)
We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z) - T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining.
Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z) - Memory-aware Scheduling for Complex Wired Networks with Iterative Graph
Optimization [4.614780125575351]
We propose an efficient memory-aware scheduling framework based on iterative graph optimization.
Our framework features an iterative graph fusion algorithm that simplifies the graph while preserving the scheduling optimality.
arXiv Detail & Related papers (2023-08-26T14:52:02Z) - RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral
Edge TPUs [12.952987240366781]
This work presents a reinforcement learning (RL) based scheduling framework, which learns the behaviors of optimal optimization algorithms.
RL generates near-optimal scheduling results with short solving runtime overhead.
Our framework has demonstrated up to $sim2.5times$ real-world on-chip runtime inference speedups over the commercial compiler.
arXiv Detail & Related papers (2023-04-10T17:22:12Z) - Arch-Graph: Acyclic Architecture Relation Predictor for
Task-Transferable Neural Architecture Search [96.31315520244605]
Arch-Graph is a transferable NAS method that predicts task-specific optimal architectures.
We show Arch-Graph's transferability and high sample efficiency across numerous tasks.
It is able to find top 0.16% and 0.29% architectures on average on two search spaces under the budget of only 50 models.
arXiv Detail & Related papers (2022-04-12T16:46:06Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Learning to Schedule DAG Tasks [7.577417675452624]
We present a novel learning-based approach to scheduling directed acyclic graphs (DAGs)
The algorithm employs a reinforcement learning agent to iteratively add edges directed to the DAG.
Our approach can be easily applied to any existing scheduling algorithms.
arXiv Detail & Related papers (2021-03-05T01:10:24Z) - Learning to Execute Programs with Instruction Pointer Attention Graph
Neural Networks [55.98291376393561]
Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks.
Recurrent neural networks (RNNs) are well-suited to long sequential chains of reasoning, but they do not naturally incorporate program structure.
We introduce a novel GNN architecture, the Instruction Pointer Attention Graph Neural Networks (IPA-GNN), which improves systematic generalization on the task of learning to execute programs.
arXiv Detail & Related papers (2020-10-23T19:12:30Z) - Knowledge-Assisted Deep Reinforcement Learning in 5G Scheduler Design:
From Theoretical Framework to Implementation [34.5517138843888]
We develop a knowledge-assisted deep reinforcement learning algorithm to design schedulers in 5G networks.
We show that a straightforward implementation of DDPG converges slowly, has a poor quality-of-service (QoS) performance, and cannot be implemented in real-world 5G systems.
arXiv Detail & Related papers (2020-09-17T14:52:12Z) - Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs.
In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings.
We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.