HUNTER: AI based Holistic Resource Management for Sustainable Cloud
Computing
- URL: http://arxiv.org/abs/2110.05529v1
- Date: Mon, 11 Oct 2021 18:11:26 GMT
- Title: HUNTER: AI based Holistic Resource Management for Sustainable Cloud
Computing
- Authors: Shreshth Tuli, Sukhpal Singh Gill, Minxian Xu, Peter Garraghan, Rami
Bahsoon, Scharam Dustdar, Rizos Sakellariou, Omer Rana, Rajkumar Buyya,
Giuliano Casale and Nicholas R. Jennings
- Abstract summary: We propose an artificial intelligence (AI) based holistic resource management technique for sustainable cloud computing called HUNTER.
The proposed model formulates the goal of optimizing energy efficiency in data centers as a multi-objective scheduling problem.
Experiments on simulated and physical cloud environments show that HUNTER outperforms state-of-the-art baselines in terms of energy consumption, SLA violation, scheduling time, cost and temperature by up to 12, 35, 43, 54 and 3 percent respectively.
- Score: 26.48962351761643
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The worldwide adoption of cloud data centers (CDCs) has given rise to the
ubiquitous demand for hosting application services on the cloud. Further,
contemporary data-intensive industries have seen a sharp upsurge in the
resource requirements of modern applications. This has led to the provisioning
of an increased number of cloud servers, giving rise to higher energy
consumption and, consequently, sustainability concerns. Traditional heuristics
and reinforcement learning based algorithms for energy-efficient cloud resource
management address the scalability and adaptability related challenges to a
limited extent. Existing work often fails to capture dependencies across
thermal characteristics of hosts, resource consumption of tasks and the
corresponding scheduling decisions. This leads to poor scalability and an
increase in the compute resource requirements, particularly in environments
with non-stationary resource demands. To address these limitations, we propose
an artificial intelligence (AI) based holistic resource management technique
for sustainable cloud computing called HUNTER. The proposed model formulates
the goal of optimizing energy efficiency in data centers as a multi-objective
scheduling problem, considering three important models: energy, thermal and
cooling. HUNTER utilizes a Gated Graph Convolution Network as a surrogate model
for approximating the Quality of Service (QoS) for a system state and
generating optimal scheduling decisions. Experiments on simulated and physical
cloud environments using the CloudSim toolkit and the COSCO framework show that
HUNTER outperforms state-of-the-art baselines in terms of energy consumption,
SLA violation, scheduling time, cost and temperature by up to 12, 35, 43, 54
and 3 percent respectively.
Related papers
- Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with
Online Learning [60.17407932691429]
Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability.
We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments.
We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
arXiv Detail & Related papers (2023-09-04T17:30:21Z) - Dynamic Scheduling for Federated Edge Learning with Streaming Data [56.91063444859008]
We consider a Federated Edge Learning (FEEL) system where training data are randomly generated over time at a set of distributed edge devices with long-term energy constraints.
Due to limited communication resources and latency requirements, only a subset of devices is scheduled for participating in the local training process in every iteration.
arXiv Detail & Related papers (2023-05-02T07:41:16Z) - Sustainable AIGC Workload Scheduling of Geo-Distributed Data Centers: A
Multi-Agent Reinforcement Learning Approach [48.18355658448509]
Recent breakthroughs in generative artificial intelligence have triggered a surge in demand for machine learning training, which poses significant cost burdens and environmental challenges due to its substantial energy consumption.
Scheduling training jobs among geographically distributed cloud data centers unveils the opportunity to optimize the usage of computing capacity powered by inexpensive and low-carbon energy.
We propose an algorithm based on multi-agent reinforcement learning and actor-critic methods to learn the optimal collaborative scheduling strategy through interacting with a cloud system built with real-life workload patterns, energy prices, and carbon intensities.
arXiv Detail & Related papers (2023-04-17T02:12:30Z) - RARE: Renewable Energy Aware Resource Management in Datacenters [9.488752723308954]
Hyperscale cloud providers have announced plans to power their datacenters using renewable energy.
Integrating renewables to power the datacenters is challenging because the power generation is intermittent.
We present a scheduler that learns effective job scheduling policies while continually adapting to the intermittent power supply from renewables.
arXiv Detail & Related papers (2022-11-10T05:17:14Z) - Measuring the Carbon Intensity of AI in Cloud Instances [91.28501520271972]
We provide a framework for measuring software carbon intensity, and propose to measure operational carbon emissions.
We evaluate a suite of approaches for reducing emissions on the Microsoft Azure cloud compute platform.
arXiv Detail & Related papers (2022-06-10T17:04:04Z) - Machine Learning (ML)-Centric Resource Management in Cloud Computing: A
Review and Future Directions [22.779373079539713]
Infrastructure as a Service (I) is one of the most important and rapidly growing fields.
One of the most important aspects of cloud computing for I is resource management.
Machine learning is being used to handle a variety of resource management tasks.
arXiv Detail & Related papers (2021-05-09T08:03:58Z) - Performance and Energy-Aware Bi-objective Tasks Scheduling for Cloud
Data Centers [0.0]
Cloud computing enables remote execution of users tasks.
The pervasive adoption of cloud computing in smart cities services and applications requires timely execution of tasks adhering to Quality of Services (QoS)
The increasing use of computing servers exacerbates the issues of high energy consumption, operating costs, and environmental pollution.
We propose a performance and energy optimization bi-objective algorithm to tradeoff the contradicting performance and energy objectives.
arXiv Detail & Related papers (2021-04-25T08:55:57Z) - Power Modeling for Effective Datacenter Planning and Compute Management [53.41102502425513]
We discuss two classes of statistical power models designed and validated to be accurate, simple, interpretable and applicable to all hardware configurations and workloads.
We demonstrate that the proposed statistical modeling techniques, while simple and scalable, predict power with less than 5% Mean Absolute Percent Error (MAPE) for more than 95% diverse Power Distribution Units (more than 2000) using only 4 features.
arXiv Detail & Related papers (2021-03-22T21:22:51Z) - Artificial Intelligence (AI)-Centric Management of Resources in Modern
Distributed Computing Systems [22.550075095184514]
Cloud Data Centres (DCS) are large scale, complex, heterogeneous, and distributed across multiple networks and geographical boundaries.
The Internet of Things (IoT)-driven applications are producing a huge amount of data that requires real-time processing and fast response.
Existing Resource Management Systems (RMS) rely on either static or solutions inadequate for such composite and dynamic systems.
arXiv Detail & Related papers (2020-06-09T06:54:07Z) - Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A
Multi-Agent Deep Reinforcement Learning Approach [82.6692222294594]
We study a risk-aware energy scheduling problem for a microgrid-powered MEC network.
We derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based advantage actor-critic (A3C) algorithm with shared neural networks.
arXiv Detail & Related papers (2020-02-21T02:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.