CoRaiS: Lightweight Real-Time Scheduler for Multi-Edge Cooperative Computing
- URL: http://arxiv.org/abs/2403.09671v2
- Date: Mon, 20 May 2024 08:38:18 GMT
- Title: CoRaiS: Lightweight Real-Time Scheduler for Multi-Edge Cooperative Computing
- Authors: Yujiao Hu, Qingmin Jia, Jinchao Chen, Yuan Yao, Yan Pan, Renchao Xie, F. Richard Yu,
- Abstract summary: Multi-edge cooperative computing that combines constrained resources of multiple edges into a powerful resource pool has the potential to deliver great benefits.
However, the mass heterogeneous resources composition and lack of scheduling strategies make the modeling and cooperating of multi-edge computing system particularly complicated.
This paper first proposes a system-level state evaluation model to shield the complex hardware configurations and redefine the different service capabilities at heterogeneous edges.
- Score: 32.99310493126955
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Multi-edge cooperative computing that combines constrained resources of multiple edges into a powerful resource pool has the potential to deliver great benefits, such as a tremendous computing power, improved response time, more diversified services. However, the mass heterogeneous resources composition and lack of scheduling strategies make the modeling and cooperating of multi-edge computing system particularly complicated. This paper first proposes a system-level state evaluation model to shield the complex hardware configurations and redefine the different service capabilities at heterogeneous edges. Secondly, an integer linear programming model is designed to cater for optimally dispatching the distributed arriving requests. Finally, a learning-based lightweight real-time scheduler, CoRaiS, is proposed. CoRaiS embeds the real-time states of multi-edge system and requests information, and combines the embeddings with a policy network to schedule the requests, so that the response time of all requests can be minimized. Evaluation results verify that CoRaiS can make a high-quality scheduling decision in real time, and can be generalized to other multi-edge computing system, regardless of system scales. Characteristic validation also demonstrates that CoRaiS successfully learns to balance loads, perceive real-time state and recognize heterogeneity while scheduling.
Related papers
- Online Frequency Scheduling by Learning Parallel Actions [5.9838600557884805]
Frequency resources need to be assigned to a set of users while allowing for concurrent transmissions in the same sub-band.
Traditional methods are insufficient to cope with all the involved constraints and uncertainties.
We propose a scheduler based on action-branching over sub-bands, which is a deep Q-learning architecture with parallel decision capabilities.
arXiv Detail & Related papers (2024-06-07T16:14:51Z) - Intelligent Hybrid Resource Allocation in MEC-assisted RAN Slicing Network [72.2456220035229]
We aim to maximize the SSR for heterogeneous service demands in the cooperative MEC-assisted RAN slicing system.
We propose a recurrent graph reinforcement learning (RGRL) algorithm to intelligently learn the optimal hybrid RA policy.
arXiv Detail & Related papers (2024-05-02T01:36:13Z) - Randomized Polar Codes for Anytime Distributed Machine Learning [66.46612460837147]
We present a novel distributed computing framework that is robust to slow compute nodes, and is capable of both approximate and exact computation of linear operations.
We propose a sequential decoding algorithm designed to handle real valued data while maintaining low computational complexity for recovery.
We demonstrate the potential applications of this framework in various contexts, such as large-scale matrix multiplication and black-box optimization.
arXiv Detail & Related papers (2023-09-01T18:02:04Z) - MARLIN: Soft Actor-Critic based Reinforcement Learning for Congestion
Control in Real Networks [63.24965775030673]
We propose a novel Reinforcement Learning (RL) approach to design generic Congestion Control (CC) algorithms.
Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return.
We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch.
arXiv Detail & Related papers (2023-02-02T18:27:20Z) - Scheduling in Parallel Finite Buffer Systems: Optimal Decisions under
Delayed Feedback [29.177402567437206]
We present a partially observable (PO) model that captures the scheduling decisions in parallel queuing systems under limited information of delayed acknowledgements.
We numerically show that the resulting policy outperforms other limited information scheduling strategies.
We show how our approach can optimise the real-time parallel processing by using network data provided by Kaggle.
arXiv Detail & Related papers (2021-09-17T13:45:02Z) - BAGUA: Scaling up Distributed Learning with System Relaxations [31.500494636704598]
BAGUA is a new communication framework for distributed data-parallel training.
Powered by the new system design, BAGUA has a great ability to implement and extend various state-of-the-art distributed learning algorithms.
In a production cluster with up to 16 machines, BAGUA can outperform PyTorch-DDP, Horovod and BytePS in the end-to-end training time.
arXiv Detail & Related papers (2021-07-03T21:27:45Z) - Better than the Best: Gradient-based Improper Reinforcement Learning for
Network Scheduling [60.48359567964899]
We consider the problem of scheduling in constrained queueing networks with a view to minimizing packet delay.
We use a policy gradient based reinforcement learning algorithm that produces a scheduler that performs better than the available atomic policies.
arXiv Detail & Related papers (2021-05-01T10:18:34Z) - Tailored Learning-Based Scheduling for Kubernetes-Oriented Edge-Cloud
System [54.588242387136376]
We introduce KaiS, a learning-based scheduling framework for edge-cloud systems.
First, we design a coordinated multi-agent actor-critic algorithm to cater to decentralized request dispatch.
Second, for diverse system scales and structures, we use graph neural networks to embed system state information.
Third, we adopt a two-time-scale scheduling mechanism to harmonize request dispatch and service orchestration.
arXiv Detail & Related papers (2021-01-17T03:45:25Z) - Combining Deep Learning and Optimization for Security-Constrained
Optimal Power Flow [94.24763814458686]
Security-constrained optimal power flow (SCOPF) is fundamental in power systems.
Modeling of APR within the SCOPF problem results in complex large-scale mixed-integer programs.
This paper proposes a novel approach that combines deep learning and robust optimization techniques.
arXiv Detail & Related papers (2020-07-14T12:38:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.