Learning Cooperative Oversubscription for Cloud by Chance-Constrained
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2211.11759v1
- Date: Mon, 21 Nov 2022 07:00:09 GMT
- Title: Learning Cooperative Oversubscription for Cloud by Chance-Constrained
Multi-Agent Reinforcement Learning
- Authors: Junjie Sheng, Lu Wang, Fangkai Yang, Bo Qiao, Hang Dong, Xiangfeng
Wang, Bo Jin, Jun Wang, Si Qin, Saravan Rajmohan, Qingwei Lin, Dongmei Zhang
- Abstract summary: Oversubscription is a common practice for improving cloud resource utilization.
This paper proposes an effective Chance Constrained Multi-Agent Reinforcement Learning (C2MARL) method to solve this problem.
- Score: 40.31099670383296
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Oversubscription is a common practice for improving cloud resource
utilization. It allows the cloud service provider to sell more resources than
the physical limit, assuming not all users would fully utilize the resources
simultaneously. However, how to design an oversubscription policy that improves
utilization while satisfying the some safety constraints remains an open
problem. Existing methods and industrial practices are over-conservative,
ignoring the coordination of diverse resource usage patterns and probabilistic
constraints. To address these two limitations, this paper formulates the
oversubscription for cloud as a chance-constrained optimization problem and
propose an effective Chance Constrained Multi-Agent Reinforcement Learning
(C2MARL) method to solve this problem. Specifically, C2MARL reduces the number
of constraints by considering their upper bounds and leverages a multi-agent
reinforcement learning paradigm to learn a safe and optimal coordination
policy. We evaluate our C2MARL on an internal cloud platform and public cloud
datasets. Experiments show that our C2MARL outperforms existing methods in
improving utilization ($20\%\sim 86\%$) under different levels of safety
constraints.
Related papers
- Unleashing MLLMs on the Edge: A Unified Framework for Cross-Modal ReID via Adaptive SVD Distillation [48.88299242238335]
Cross-Modal Re-identification (CM-ReID) faces challenges due to maintaining a fragmented ecosystem of specialized cloud models.<n>We propose MLLMEmbed-ReID, a unified framework based on a powerful cloud-edge architecture.
arXiv Detail & Related papers (2026-02-13T13:48:08Z) - Joint Continual Learning of Local Language Models and Cloud Offloading Decisions with Budget Constraints [13.890405825812065]
We propose DA-GRPO, a dual-advantage extension of Group Relative Policy Optimization.<n>It incorporates cloud-usage constraints directly into advantage computation, avoiding fixed reward shaping and external routing models.<n> Experiments on mathematical reasoning and code generation benchmarks show that DA-GRPO improves post-switch accuracy, substantially reduces forgetting, and maintains stable cloud usage.
arXiv Detail & Related papers (2026-01-29T23:27:15Z) - Collaborative Device-Cloud LLM Inference through Reinforcement Learning [17.71514700623717]
Device-cloud collaboration has emerged as a promising paradigm for deploying large language models (LLMs)<n>We propose a framework where the on-device LLM makes routing decisions at the end of its solving process, with this capability instilled through post-training.<n>In particular, we formulate a reward problem with carefully designed rewards that encourage effective problem solving and judicious offloading to the cloud.
arXiv Detail & Related papers (2025-09-28T19:48:56Z) - Cloud-Device Collaborative Agents for Sequential Recommendation [36.05863003744828]
Large language models (LLMs) have enabled agent-based recommendation systems with strong semantic understanding and flexible reasoning capabilities.<n>LLMs offer powerful personalization, but they often suffer from privacy concerns, limited access to real-time signals, and scalability bottlenecks.<n>We propose a novel Cloud-Device collaborative framework for sequential Recommendation, powered by dual agents.
arXiv Detail & Related papers (2025-09-01T15:28:11Z) - SLA-Centric Automated Algorithm Selection Framework for Cloud Environments [0.0]
Cloud computing offers on-demand resource access, regulated by Service-Level Agreements (SLAs) between consumers and Cloud Service Providers (CSPs)<n>We propose an SLA-aware automated algorithm-selection framework for optimization problems in resource-constrained cloud environments.
arXiv Detail & Related papers (2025-07-29T16:12:37Z) - Offline Learning for Combinatorial Multi-armed Bandits [56.96242764723241]
Off-CMAB is the first offline learning framework for CMAB.
Off-CMAB combines pessimistic reward estimations with solvers.
Experiments on synthetic and real-world datasets highlight the superior performance of CLCB.
arXiv Detail & Related papers (2025-01-31T16:56:18Z) - CE-CoLLM: Efficient and Adaptive Large Language Models Through Cloud-Edge Collaboration [1.6021932740447968]
Large Language Models (LLMs) have achieved remarkable success in serving end-users with human-like intelligence.
LLMs demand high computational resources, making it challenging to deploy them to satisfy various performance objectives.
We introduce CE-CoLLM, a novel cloud-edge collaboration framework that supports efficient and adaptive LLM inference for end-users at the edge.
arXiv Detail & Related papers (2024-11-05T06:00:27Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - Online Continual Learning Without the Storage Constraint [67.66235695269839]
We contribute a simple algorithm, which updates a kNN classifier continually along with a fixed, pretrained feature extractor.
It can adapt to rapidly changing streams, has zero stability gap, operates within tiny computational budgets, has low storage requirements by only storing features.
It can outperform existing methods by over 20% in accuracy on two large-scale online continual learning datasets.
arXiv Detail & Related papers (2023-05-16T08:03:07Z) - Diversity Through Exclusion (DTE): Niche Identification for
Reinforcement Learning through Value-Decomposition [63.67574523750839]
We propose a generic reinforcement learning (RL) algorithm that performs better than baseline deep Q-learning algorithms in environments with multiple variably-valued niches.
We show that agents trained this way can escape poor-but-attractive local optima to instead converge to harder-to-discover higher value strategies.
arXiv Detail & Related papers (2023-02-02T16:00:19Z) - Resource Allocation to Agents with Restrictions: Maximizing Likelihood
with Minimum Compromise [28.2469613376685]
We show that a Principle chooses a maximum matching randomly so that each agent is matched to a resource with some probability.
Agents would like to improve their chances of being matched by modifying their restrictions within certain limits.
We experimentally evaluate our methods on synthetic datasets as well as on two novel real-world datasets.
arXiv Detail & Related papers (2022-09-12T11:58:19Z) - DualCF: Efficient Model Extraction Attack from Counterfactual
Explanations [57.46134660974256]
Cloud service providers have launched Machine-Learning-as-a-Service platforms to allow users to access large-scale cloudbased models via APIs.
Such extra information inevitably causes the cloud models to be more vulnerable to extraction attacks.
We propose a novel simple yet efficient querying strategy to greatly enhance the querying efficiency to steal a classification model.
arXiv Detail & Related papers (2022-05-13T08:24:43Z) - DeCOM: Decomposed Policy for Constrained Cooperative Multi-Agent
Reinforcement Learning [26.286805758673474]
We develop a textitconstrained cooperative MARL framework, named DeCOM, for such MASes.
DeCOM decomposes the policy of each agent into two modules, which empowers information sharing among agents to achieve better cooperation.
We validate the effectiveness of DeCOM with various types of costs in both toy and large-scale (with 500 agents) environments.
arXiv Detail & Related papers (2021-11-10T12:31:30Z) - Structure-aware reinforcement learning for node-overload protection in
mobile edge computing [3.3865605512957457]
This work presents an adaptive admission control policy to prevent edge node from getting overloaded.
We extend the framework to work for node overload-protection problem in a discounted-cost setting.
Our empirical evaluations show that the total discounted cost incurred by SALMUT is similar to state-of-the-art deep RL algorithms.
arXiv Detail & Related papers (2021-06-29T18:11:41Z) - Improved Algorithms for Conservative Exploration in Bandits [113.55554483194832]
We study the conservative learning problem in the contextual linear bandit setting and introduce a novel algorithm, the Conservative Constrained LinUCB (CLUCB2)
We derive regret bounds for CLUCB2 that match existing results and empirically show that it outperforms state-of-the-art conservative bandit algorithms in a number of synthetic and real-world problems.
arXiv Detail & Related papers (2020-02-08T19:35:01Z) - Reinforcement Learning-based Application Autoscaling in the Cloud: A
Survey [2.9751538760825085]
Reinforcement Learning (RL) has demonstrated a great potential for automatically solving decision-making problems in complex uncertain environments.
It is possible to learn transparent (with no human intervention), dynamic (no static plans), and adaptable (constantly updated) resource management policies to execute applications.
It exploits the Cloud elasticity to optimize the execution of applications according to given optimization criteria.
arXiv Detail & Related papers (2020-01-27T18:23:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.