Related papers: Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center

Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center

URL: http://arxiv.org/abs/2201.11727v2
Date: Fri, 28 Jan 2022 19:50:54 GMT
Title: Multi-Agent Reinforcement Learning for Network Load Balancing in Data Center
Authors: Zhiyuan Yao, Zihan Ding, Thomas Clausen
Abstract summary: This paper presents the network load balancing problem, a challenging real-world task for reinforcement learning methods. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all methods are directly trained and evaluated on an emulation system.
Score: 4.141301293112916
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: This paper presents the network load balancing problem, a challenging real-world task for multi-agent reinforcement learning (MARL) methods. Traditional heuristic solutions like Weighted-Cost Multi-Path (WCMP) and Local Shortest Queue (LSQ) are less flexible to the changing workload distributions and arrival rates, with a poor balance among multiple load balancers. The cooperative network load balancing task is formulated as a Dec-POMDP problem, which naturally induces the MARL methods. To bridge the reality gap for applying learning-based methods, all methods are directly trained and evaluated on an emulation system from moderate-to large-scale. Experiments on realistic testbeds show that the independent and "selfish" load balancing strategies are not necessarily the globally optimal ones, while the proposed MARL solution has a superior performance over different realistic settings. Additionally, the potential difficulties of MARL methods for network load balancing are analysed, which helps to draw the attention of the learning and network communities to such challenges.

Related papers

An Offline Multi-Agent Reinforcement Learning Framework for Radio Resource Management [5.771885923067511]
offline multi-agent reinforcement learning (MARL) addresses key limitations of online MARL. We propose an offline MARL algorithm for radio resource management (RRM) We evaluate three training paradigms: centralized, independent, and centralized training with decentralized execution (CTDE)
arXiv Detail & Related papers (2025-01-22T16:25:46Z)
Over-the-Air Fair Federated Learning via Multi-Objective Optimization [52.295563400314094]
We propose an over-the-air fair federated learning algorithm (OTA-FFL) to train fair FL models. Experiments demonstrate the superiority of OTA-FFL in achieving fairness and robust performance.
arXiv Detail & Related papers (2025-01-06T21:16:51Z)
Value-Based Deep Multi-Agent Reinforcement Learning with Dynamic Sparse Training [38.03693752287459]
Multi-agent Reinforcement Learning (MARL) relies on neural networks with numerous parameters in multi-agent scenarios. This paper proposes the utilization of dynamic sparse training (DST), a technique proven effective in deep supervised learning tasks. We introduce an innovative Multi-Agent Sparse Training (MAST) framework aimed at simultaneously enhancing the reliability of learning targets and the rationality of sample distribution.
arXiv Detail & Related papers (2024-09-28T15:57:24Z)
Load Balancing in Federated Learning [3.2999744336237384]
Federated Learning (FL) is a decentralized machine learning framework that enables learning from data distributed across multiple remote devices. This paper proposes a load metric for scheduling policies based on the Age of Information. We establish the optimal parameters of the Markov chain model and validate our approach through simulations.
arXiv Detail & Related papers (2024-08-01T00:56:36Z)
Sparse Mean Field Load Balancing in Large Localized Queueing Systems [30.672653758080568]
We learn a near-optimal load balancing policy in sparsely connected queueing networks in a tractable manner. By formulating a novel mean field control problem in the context of with bounded degree, we reduce the otherwise difficult multi-agent problem to a single-agent problem. Empirically, the proposed methodology performs well on several realistic and scalable wireless network topologies.
arXiv Detail & Related papers (2023-12-20T12:31:28Z)
Decentralized Online Learning in Task Assignment Games for Mobile Crowdsensing [55.07662765269297]
A mobile crowdsensing platform (MCSP) sequentially publishes sensing tasks to the available mobile units (MUs) that signal their willingness to participate in a task by sending sensing offers back to the MCSP. A stable task assignment must address two challenges: the MCSP's and MUs' conflicting goals, and the uncertainty about the MUs' required efforts and preferences. To overcome these challenges a novel decentralized approach combining matching theory and online learning, called collision-avoidance multi-armed bandit with strategic free sensing (CA-MAB-SFS) is proposed.
arXiv Detail & Related papers (2023-09-19T13:07:15Z)
A Multi-Head Ensemble Multi-Task Learning Approach for Dynamical Computation Offloading [62.34538208323411]
We propose a multi-head ensemble multi-task learning (MEMTL) approach with a shared backbone and multiple prediction heads (PHs) MEMTL outperforms benchmark methods in both the inference accuracy and mean square error without requiring additional training data.
arXiv Detail & Related papers (2023-09-02T11:01:16Z)
Learning Distributed and Fair Policies for Network Load Balancing as Markov Potentia Game [4.892398873024191]
This paper investigates the network load balancing problem in data centers (DCs) where multiple load balancers (LBs) are deployed. The challenges of this problem consist of the heterogeneous processing architecture and dynamic environments. We formulate the multi-agent load balancing problem as a Markov potential game, with a carefully and properly designed workload distribution fairness as the potential function. A fully distributed MARL algorithm is proposed to approximate the Nash equilibrium of the game.
arXiv Detail & Related papers (2022-06-03T08:29:02Z)
Efficient Model-based Multi-agent Reinforcement Learning via Optimistic Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment. We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z)
Towards Intelligent Load Balancing in Data Centers [0.5505634045241288]
This paper proposes Aquarius to bridge the gap between machine learning and networking systems. It demonstrates its ability of conducting both offline data analysis and online model deployment in realistic systems.
arXiv Detail & Related papers (2021-10-27T12:47:30Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Low-Latency Federated Learning over Wireless Channels with Differential Privacy [142.5983499872664]
In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server. In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement.
arXiv Detail & Related papers (2021-06-20T13:51:18Z)
Optimization-driven Machine Learning for Intelligent Reflecting Surfaces Assisted Wireless Networks [82.33619654835348]
Intelligent surface (IRS) has been employed to reshape the wireless channels by controlling individual scattering elements' phase shifts. Due to the large size of scattering elements, the passive beamforming is typically challenged by the high computational complexity. In this article, we focus on machine learning (ML) approaches for performance in IRS-assisted wireless networks.
arXiv Detail & Related papers (2020-08-29T08:39:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.