Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under
Delayed Feedback
- URL: http://arxiv.org/abs/2109.08548v1
- Date: Fri, 17 Sep 2021 13:45:02 GMT
- Title: Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under
Delayed Feedback
- Authors: Anam Tahir, Bastian Alt, Amr Rizk, Heinz Koeppl
- Abstract summary: We present a partially observable (PO) model that captures the scheduling decisions in parallel queuing systems under limited information of delayed acknowledgements.
We numerically show that the resulting policy outperforms other limited information scheduling strategies.
We show how our approach can optimise the real-time parallel processing by using network data provided by Kaggle.
- Score: 29.177402567437206
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Scheduling decisions in parallel queuing systems arise as a fundamental
problem, underlying the dimensioning and operation of many computing and
communication systems, such as job routing in data center clusters, multipath
communication, and Big Data systems. In essence, the scheduler maps each
arriving job to one of the possibly heterogeneous servers while aiming at an
optimization goal such as load balancing, low average delay or low loss rate.
One main difficulty in finding optimal scheduling decisions here is that the
scheduler only partially observes the impact of its decisions, e.g., through
the delayed acknowledgements of the served jobs. In this paper, we provide a
partially observable (PO) model that captures the scheduling decisions in
parallel queuing systems under limited information of delayed acknowledgements.
We present a simulation model for this PO system to find a near-optimal
scheduling policy in real-time using a scalable Monte Carlo tree search
algorithm. We numerically show that the resulting policy outperforms other
limited information scheduling strategies such as variants of
Join-the-Most-Observations and has comparable performance to full information
strategies like: Join-the-Shortest-Queue, Join-the- Shortest-Queue(d) and
Shortest-Expected-Delay. Finally, we show how our approach can optimise the
real-time parallel processing by using network data provided by Kaggle.
Related papers
- Timely Communications for Remote Inference [16.671201899392585]
We analyze the impact of data freshness on remote inference systems.
We propose a new "selection-from-buffer" model for sending the features.
We also design low-complexity scheduling policies to improve inference performance.
arXiv Detail & Related papers (2024-04-25T01:53:21Z) - Dynamic Scheduling for Federated Edge Learning with Streaming Data [56.91063444859008]
We consider a Federated Edge Learning (FEEL) system where training data are randomly generated over time at a set of distributed edge devices with long-term energy constraints.
Due to limited communication resources and latency requirements, only a subset of devices is scheduled for participating in the local training process in every iteration.
arXiv Detail & Related papers (2023-05-02T07:41:16Z) - Scheduling Inference Workloads on Distributed Edge Clusters with
Reinforcement Learning [11.007816552466952]
This paper focuses on the problem of scheduling inference queries on Deep Neural Networks in edge networks at short timescales.
By means of simulations, we analyze several policies in the realistic network settings and workloads of a large ISP.
We design ASET, a Reinforcement Learning based scheduling algorithm able to adapt its decisions according to the system conditions.
arXiv Detail & Related papers (2023-01-31T13:23:34Z) - Learning Mean-Field Control for Delayed Information Load Balancing in
Large Queuing Systems [26.405495663998828]
In this work, we consider a multi-agent load balancing system, with delayed information, consisting of many clients (load balancers) and many parallel queues.
We apply policy gradient reinforcement learning algorithms to find an optimal load balancing solution.
Our approach is scalable but also shows good performance when compared to the state-of-the-art power-of-d variant of the Join-the-Shortest-Queue (JSQ)
arXiv Detail & Related papers (2022-08-09T13:47:19Z) - Large-Scale Sequential Learning for Recommender and Engineering Systems [91.3755431537592]
In this thesis, we focus on the design of an automatic algorithms that provide personalized ranking by adapting to the current conditions.
For the former, we propose novel algorithm called SAROS that take into account both kinds of feedback for learning over the sequence of interactions.
The proposed idea of taking into account the neighbour lines shows statistically significant results in comparison with the initial approach for faults detection in power grid.
arXiv Detail & Related papers (2022-05-13T21:09:41Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud
System [54.588242387136376]
We introduce KaiS, a learning-based scheduling framework for edge-cloud systems.
First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch.
Second, for diverse system scales and structures, we use graph neural networks to embed system state information.
Third, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration.
arXiv Detail & Related papers (2021-01-17T03:45:25Z) - Rosella: A Self-Driving Distributed Scheduler for Heterogeneous Clusters [7.206919625027208]
We present Rosella, a new self-driving, distributed approach for task scheduling in heterogeneous clusters.
Rosella automatically learns the compute environment and adjusts its scheduling policy in real-time.
We evaluate Rosella with a variety of workloads on a 32-node AWS cluster.
arXiv Detail & Related papers (2020-10-28T20:12:29Z) - Dynamic Multi-Robot Task Allocation under Uncertainty and Temporal
Constraints [52.58352707495122]
We present a multi-robot allocation algorithm that decouples the key computational challenges of sequential decision-making under uncertainty and multi-agent coordination.
We validate our results over a wide range of simulations on two distinct domains: multi-arm conveyor belt pick-and-place and multi-drone delivery dispatch in a city.
arXiv Detail & Related papers (2020-05-27T01:10:41Z) - MOPS-Net: A Matrix Optimization-driven Network forTask-Oriented 3D Point
Cloud Downsampling [86.42733428762513]
MOPS-Net is a novel interpretable deep learning-based method for matrix optimization.
We show that MOPS-Net can achieve favorable performance against state-of-the-art deep learning-based methods over various tasks.
arXiv Detail & Related papers (2020-05-01T14:01:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.