Related papers: GCNScheduler: Scheduling Distributed Computing Applications using Graph Convolutional Networks

GCNScheduler: Scheduling Distributed Computing Applications using Graph Convolutional Networks

URL: http://arxiv.org/abs/2110.11552v1
Date: Fri, 22 Oct 2021 01:54:10 GMT
Title: GCNScheduler: Scheduling Distributed Computing Applications using Graph Convolutional Networks
Authors: Mehrdad Kiamari and Bhaskar Krishnamachari
Abstract summary: We propose a graph convolutional network-based scheduler (GCNScheduler) By carefully integrating an inter-task data dependency structure with network settings into an input graph, the GCNScheduler can efficiently schedule tasks for a given objective. We show that it better makespan than the classic HEFT algorithm, and almost the same throughput as throughput-oriented HEFT.
Score: 12.284934135116515
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the classical problem of scheduling task graphs corresponding to complex applications on distributed computing systems. A number of heuristics have been previously proposed to optimize task scheduling with respect to metrics such as makespan and throughput. However, they tend to be slow to run, particularly for larger problem instances, limiting their applicability in more dynamic systems. Motivated by the goal of solving these problems more rapidly, we propose, for the first time, a graph convolutional network-based scheduler (GCNScheduler). By carefully integrating an inter-task data dependency structure with network settings into an input graph and feeding it to an appropriate GCN, the GCNScheduler can efficiently schedule tasks of complex applications for a given objective. We evaluate our scheme with baselines through simulations. We show that not only can our scheme quickly and efficiently learn from existing scheduling schemes, but also it can easily be applied to large-scale settings where current scheduling schemes fail to handle. We show that it achieves better makespan than the classic HEFT algorithm, and almost the same throughput as throughput-oriented HEFT (TP-HEFT), while providing several orders of magnitude faster scheduling times in both cases. For example, for makespan minimization, GCNScheduler schedules 50-node task graphs in about 4 milliseconds while HEFT takes more than 1500 seconds; and for throughput maximization, GCNScheduler schedules 100-node task graphs in about 3.3 milliseconds, compared to about 6.9 seconds for TP-HEFT.

Related papers

GraphEdge: Dynamic Graph Partition and Task Scheduling for GNNs Computing in Edge Network [5.027047552301203]
We propose GraphEdge, an efficient GNN-based edge computing architecture. It considers the EC system of GNN tasks, where there are associations between users and it needs to take into account the task data of its neighbors. Based on the optimized graph layout, our proposed deep reinforcement learning (DRL) based graph offloading algorithm (DRLGO) is executed.
arXiv Detail & Related papers (2025-04-22T13:45:13Z)
Plan-over-Graph: Towards Parallelable LLM Agent Schedule [53.834646147919436]
Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning. This paper introduces a novel paradigm, plan-over-graph, in which the model first decomposes a real-life textual task into executable subtasks and constructs an abstract task graph. The model then understands this task graph as input and generates a plan for parallel execution.
arXiv Detail & Related papers (2025-02-20T13:47:51Z)
FusionLLM: A Decentralized LLM Training System on Geo-distributed GPUs with Adaptive Compression [55.992528247880685]
Decentralized training faces significant challenges regarding system design and efficiency. We present FusionLLM, a decentralized training system designed and implemented for training large deep neural networks (DNNs) We show that our system and method can achieve 1.45 - 9.39x speedup compared to baseline methods while ensuring convergence.
arXiv Detail & Related papers (2024-10-16T16:13:19Z)
T-GAE: Transferable Graph Autoencoder for Network Alignment [79.89704126746204]
T-GAE is a graph autoencoder framework that leverages transferability and stability of GNNs to achieve efficient network alignment without retraining. Our experiments demonstrate that T-GAE outperforms the state-of-the-art optimization method and the best GNN approach by up to 38.7% and 50.8%, respectively.
arXiv Detail & Related papers (2023-10-05T02:58:29Z)
Memory-aware Scheduling for Complex Wired Networks with Iterative Graph Optimization [4.614780125575351]
We propose an efficient memory-aware scheduling framework based on iterative graph optimization. Our framework features an iterative graph fusion algorithm that simplifies the graph while preserving the scheduling optimality.
arXiv Detail & Related papers (2023-08-26T14:52:02Z)
RESPECT: Reinforcement Learning based Edge Scheduling on Pipelined Coral Edge TPUs [12.952987240366781]
This work presents a reinforcement learning (RL) based scheduling framework, which learns the behaviors of optimal optimization algorithms. RL generates near-optimal scheduling results with short solving runtime overhead. Our framework has demonstrated up to $sim2.5times$ real-world on-chip runtime inference speedups over the commercial compiler.
arXiv Detail & Related papers (2023-04-10T17:22:12Z)
Arch-Graph: Acyclic Architecture Relation Predictor for Task-Transferable Neural Architecture Search [96.31315520244605]
Arch-Graph is a transferable NAS method that predicts task-specific optimal architectures. We show Arch-Graph's transferability and high sample efficiency across numerous tasks. It is able to find top 0.16% and 0.29% architectures on average on two search spaces under the budget of only 50 models.
arXiv Detail & Related papers (2022-04-12T16:46:06Z)
Better than the Best: Gradient-based Improper Reinforcement Learning for Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay. We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z)
Learning to Schedule DAG Tasks [7.577417675452624]
We present a novel learning-based approach to scheduling directed acyclic graphs (DAGs) The algorithm employs a reinforcement learning agent to iteratively add edges directed to the DAG. Our approach can be easily applied to any existing scheduling algorithms.
arXiv Detail & Related papers (2021-03-05T01:10:24Z)
Learning to Execute Programs with Instruction Pointer Attention Graph Neural Networks [55.98291376393561]
Graph neural networks (GNNs) have emerged as a powerful tool for learning software engineering tasks. Recurrent neural networks (RNNs) are well-suited to long sequential chains of reasoning, but they do not naturally incorporate program structure. We introduce a novel GNN architecture, the Instruction Pointer Attention Graph Neural Networks (IPA-GNN), which improves systematic generalization on the task of learning to execute programs.
arXiv Detail & Related papers (2020-10-23T19:12:30Z)
Knowledge-Assisted Deep Reinforcement Learning in 5G Scheduler Design: From Theoretical Framework to Implementation [34.5517138843888]
We develop a knowledge-assisted deep reinforcement learning algorithm to design schedulers in 5G networks. We show that a straightforward implementation of DDPG converges slowly, has a poor quality-of-service (QoS) performance, and cannot be implemented in real-world 5G systems.
arXiv Detail & Related papers (2020-09-17T14:52:12Z)
Scaling Graph Neural Networks with Approximate PageRank [64.92311737049054]
We present the PPRGo model which utilizes an efficient approximation of information diffusion in GNNs. In addition to being faster, PPRGo is inherently scalable, and can be trivially parallelized for large datasets like those found in industry settings. We show that training PPRGo and predicting labels for all nodes in this graph takes under 2 minutes on a single machine, far outpacing other baselines on the same graph.
arXiv Detail & Related papers (2020-07-03T09:30:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.