RISCLESS: A Reinforcement Learning Strategy to Exploit Unused Cloud
Resources
- URL: http://arxiv.org/abs/2205.08350v1
- Date: Thu, 28 Apr 2022 06:49:24 GMT
- Title: RISCLESS: A Reinforcement Learning Strategy to Exploit Unused Cloud
Resources
- Authors: Sidahmed Yalles (UR1, IRISA-D4), Mohamed Handaoui (Hypermedia, UR1,
IRISA-D4), Jean-Emile Dartois (IRT b-com, DiverSe, UR1, IRISA-D4), Olivier
Barais (UR1, IRISA-D4), Laurent d'Orazio, Jalil Boukhobza (ENSTA Bretagne,
Lab-STICC\_SHAKER)
- Abstract summary: One of the main objectives of Cloud Providers (CPs) is to guarantee the Service-Level Agreement (SLA) of customers.
This paper proposes RISCLESS, a Reinforcement Learning strategy to exploit unused Cloud resources.
- Score: 0.44634886884474834
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: One of the main objectives of Cloud Providers (CP) is to guarantee the
Service-Level Agreement (SLA) of customers while reducing operating costs. To
achieve this goal, CPs have built large-scale datacenters. This leads, however,
to underutilized resources and an increase in costs. A way to improve the
utilization of resources is to reclaim the unused parts and resell them at a
lower price. Providing SLA guarantees to customers on reclaimed resources is a
challenge due to their high volatility. Some state-of-the-art solutions
consider keeping a proportion of resources free to absorb sudden variation in
workloads. Others consider stable resources on top of the volatile ones to fill
in for the lost resources. However, these strategies either reduce the amount
of reclaimable resources or operate on less volatile ones such as Amazon Spot
instance. In this paper, we proposed RISCLESS, a Reinforcement Learning
strategy to exploit unused Cloud resources. Our approach consists of using a
small proportion of stable on-demand resources alongside the ephemeral ones in
order to guarantee customers SLA and reduce the overall costs. The approach
decides when and how much stable resources to allocate in order to fulfill
customers' demands. RISCLESS improved the CPs' profits by an average of 15.9%
compared to state-of-the-art strategies. It also reduced the SLA violation time
by an average of 36.7% while increasing the amount of used ephemeral resources
by 19.5% on average
Related papers
- Multi-Level ML Based Burst-Aware Autoscaling for SLO Assurance and Cost
Efficiency [3.5624365288866007]
This paper introduces BAScaler, a Burst-Aware Autoscaling framework for containerized cloud services or applications under complex workloads.
BAScaler incorporates a novel prediction-based burst detection mechanism that distinguishes between predictable periodic workload spikes and actual bursts.
arXiv Detail & Related papers (2024-02-20T12:28:25Z) - Democratizing LLMs: An Exploration of Cost-Performance Trade-offs in
Self-Refined Open-Source Models [53.859446823312126]
SoTA open source models of varying sizes from 7B - 65B, on average, improve 8.2% from their baseline performance.
Strikingly, even models with extremely small memory footprints, such as Vicuna-7B, show a 11.74% improvement overall and up to a 25.39% improvement in high-creativity, open ended tasks.
arXiv Detail & Related papers (2023-10-11T15:56:00Z) - RAPID: Enabling Fast Online Policy Learning in Dynamic Public Cloud
Environments [7.825552412435501]
We propose a novel framework for fast, fully-online resource allocation policy learning in dynamic operating environments.
We show that our framework can learn stable resource allocation policies in minutes, as compared with hours in prior state-of-the-art.
arXiv Detail & Related papers (2023-04-10T18:04:39Z) - Efficient Exploration in Resource-Restricted Reinforcement Learning [6.463999435780127]
In many real-world applications of reinforcement learning, performing actions requires consuming certain types of resources that are non-replenishable in each episode.
We propose a novel resource-aware exploration bonus (RAEB) to make reasonable usage of resources.
RAEB significantly outperforms state-of-the-art exploration strategies in resource-restricted reinforcement learning environments.
arXiv Detail & Related papers (2022-12-14T02:50:26Z) - Outsourcing Training without Uploading Data via Efficient Collaborative
Open-Source Sampling [49.87637449243698]
Traditional outsourcing requires uploading device data to the cloud server.
We propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources.
We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training.
arXiv Detail & Related papers (2022-10-23T00:12:18Z) - PROMPT: Learning Dynamic Resource Allocation Policies for Network
Applications [16.812611987082082]
We propose PROMPT, a novel resource allocation framework using proactive prediction to guide a reinforcement learning controller.
We show that PROMPT incurs 4.2x fewer violations, reduces severity of policy violations by 12.7x, improves best-effort workload performance, and improves overall power efficiency over prior work.
arXiv Detail & Related papers (2022-01-19T23:34:34Z) - Coordinated Online Learning for Multi-Agent Systems with Coupled
Constraints and Perturbed Utility Observations [91.02019381927236]
We introduce a novel method to steer the agents toward a stable population state, fulfilling the given resource constraints.
The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian.
arXiv Detail & Related papers (2020-10-21T10:11:17Z) - ReLeaSER: A Reinforcement Learning Strategy for Optimizing Utilization
Of Ephemeral Cloud Resources [2.205500582481277]
We propose a Reinforcement Learning strategy for optimizing the ephemeral resources' utilization in the cloud.
Our solution reduces significantly the SLA violation penalties on average by 2.7x and up to 3.4x.
It also improves considerably the CPs' potential savings by 27.6% on average and up to 43.6%.
arXiv Detail & Related papers (2020-09-23T15:19:28Z) - Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep
Learning [61.29990368322931]
Pollux improves scheduling performance in deep learning (DL) clusters by adaptively co-optimizing inter-dependent factors.
Pollux reduces average job completion times by 37-50% relative to state-of-the-art DL schedulers.
arXiv Detail & Related papers (2020-08-27T16:56:48Z) - Hierarchical Adaptive Contextual Bandits for Resource Constraint based
Recommendation [49.69139684065241]
Contextual multi-armed bandit (MAB) achieves cutting-edge performance on a variety of problems.
In this paper, we propose a hierarchical adaptive contextual bandit method (HATCH) to conduct the policy learning of contextual bandits with a budget constraint.
arXiv Detail & Related papers (2020-04-02T17:04:52Z) - Improving Candidate Generation for Low-resource Cross-lingual Entity
Linking [81.41804263432684]
Cross-lingual entity linking (XEL) is the task of finding referents in a target-language knowledge base (KB) for mentions extracted from source-language texts.
In this paper, we propose three improvements that (1) reduce the disconnect between entity mentions and KB entries, and (2) improve the robustness of the model to low-resource scenarios.
arXiv Detail & Related papers (2020-03-03T05:32:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.