Artificial Intelligence (AI)-Centric Management of Resources in Modern
Distributed Computing Systems
- URL: http://arxiv.org/abs/2006.05075v2
- Date: Sat, 7 Nov 2020 01:47:18 GMT
- Title: Artificial Intelligence (AI)-Centric Management of Resources in Modern
Distributed Computing Systems
- Authors: Shashikant Ilager, Rajeev Muralidhar and Rajkumar Buyya
- Abstract summary: Cloud Data Centres (DCS) are large scale, complex, heterogeneous, and distributed across multiple networks and geographical boundaries.
The Internet of Things (IoT)-driven applications are producing a huge amount of data that requires real-time processing and fast response.
Existing Resource Management Systems (RMS) rely on either static or solutions inadequate for such composite and dynamic systems.
- Score: 22.550075095184514
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Contemporary Distributed Computing Systems (DCS) such as Cloud Data Centres
are large scale, complex, heterogeneous, and distributed across multiple
networks and geographical boundaries. On the other hand, the Internet of Things
(IoT)-driven applications are producing a huge amount of data that requires
real-time processing and fast response. Managing these resources efficiently to
provide reliable services to end-users or applications is a challenging task.
The existing Resource Management Systems (RMS) rely on either static or
heuristic solutions inadequate for such composite and dynamic systems. The
advent of Artificial Intelligence (AI) due to data availability and processing
capabilities manifested into possibilities of exploring data-driven solutions
in RMS tasks that are adaptive, accurate, and efficient. In this regard, this
paper aims to draw the motivations and necessities for data-driven solutions in
resource management. It identifies the challenges associated with it and
outlines the potential future research directions detailing where and how to
apply the data-driven techniques in the different RMS tasks. Finally, it
provides a conceptual data-driven RMS model for DCS and presents the two
real-time use cases (GPU frequency scaling and data centre resource management
from Google Cloud and Microsoft Azure) demonstrating AI-centric approaches'
feasibility.
Related papers
- Automatic AI Model Selection for Wireless Systems: Online Learning via Digital Twinning [50.332027356848094]
AI-based applications are deployed at intelligent controllers to carry out functionalities like scheduling or power control.
The mapping between context and AI model parameters is ideally done in a zero-shot fashion.
This paper introduces a general methodology for the online optimization of AMS mappings.
arXiv Detail & Related papers (2024-06-22T11:17:50Z) - Adaptive Resource Allocation for Virtualized Base Stations in O-RAN with
Online Learning [60.17407932691429]
Open Radio Access Network systems, with their base stations (vBSs), offer operators the benefits of increased flexibility, reduced costs, vendor diversity, and interoperability.
We propose an online learning algorithm that balances the effective throughput and vBS energy consumption, even under unforeseeable and "challenging'' environments.
We prove the proposed solutions achieve sub-linear regret, providing zero average optimality gap even in challenging environments.
arXiv Detail & Related papers (2023-09-04T17:30:21Z) - Bringing AI to the edge: A formal M&S specification to deploy effective
IoT architectures [0.0]
The Internet of Things is transforming our society, providing new services that improve the quality of life and resource management.
These applications are based on ubiquitous networks of multiple distributed devices, with limited computing resources and power.
New architectures such as fog computing are emerging to bring computing infrastructure closer to data sources.
arXiv Detail & Related papers (2023-05-11T21:29:58Z) - Outsourcing Training without Uploading Data via Efficient Collaborative
Open-Source Sampling [49.87637449243698]
Traditional outsourcing requires uploading device data to the cloud server.
We propose to leverage widely available open-source data, which is a massive dataset collected from public and heterogeneous sources.
We develop a novel strategy called Efficient Collaborative Open-source Sampling (ECOS) to construct a proximal proxy dataset from open-source data for cloud training.
arXiv Detail & Related papers (2022-10-23T00:12:18Z) - A Survey on Machine Learning for Geo-Distributed Cloud Data Center
Management [4.226118870861363]
Cloud service providers have been distributing data centers globally to reduce operating costs and improve quality of service.
Such large scale and complex orchestration of software workload and hardware resources remains a difficult problem to solve efficiently.
We review the state-of-the-art Machine Learning techniques for the cloud data center management problem.
arXiv Detail & Related papers (2022-05-17T03:14:54Z) - Machine Learning Empowered Intelligent Data Center Networking: A Survey [35.55535885962517]
This paper comprehensively investigates the application of machine learning to data center networking.
It covers flow prediction, flow classification, load balancing, resource management, routing optimization, and congestion control.
We design a quality assessment criteria called REBEL-3S to impartially measure the strengths and weaknesses of these research works.
arXiv Detail & Related papers (2022-02-28T05:27:22Z) - SOLIS -- The MLOps journey from data acquisition to actionable insights [62.997667081978825]
In this paper we present a unified deployment pipeline and freedom-to-operate approach that supports all requirements while using basic cross-platform tensor framework and script language engines.
This approach however does not supply the needed procedures and pipelines for the actual deployment of machine learning capabilities in real production grade systems.
arXiv Detail & Related papers (2021-12-22T14:45:37Z) - HUNTER: AI based Holistic Resource Management for Sustainable Cloud
Computing [26.48962351761643]
We propose an artificial intelligence (AI) based holistic resource management technique for sustainable cloud computing called HUNTER.
The proposed model formulates the goal of optimizing energy efficiency in data centers as a multi-objective scheduling problem.
Experiments on simulated and physical cloud environments show that HUNTER outperforms state-of-the-art baselines in terms of energy consumption, SLA violation, scheduling time, cost and temperature by up to 12, 35, 43, 54 and 3 percent respectively.
arXiv Detail & Related papers (2021-10-11T18:11:26Z) - Machine Learning (ML)-Centric Resource Management in Cloud Computing: A
Review and Future Directions [22.779373079539713]
Infrastructure as a Service (I) is one of the most important and rapidly growing fields.
One of the most important aspects of cloud computing for I is resource management.
Machine learning is being used to handle a variety of resource management tasks.
arXiv Detail & Related papers (2021-05-09T08:03:58Z) - Power Modeling for Effective Datacenter Planning and Compute Management [53.41102502425513]
We discuss two classes of statistical power models designed and validated to be accurate, simple, interpretable and applicable to all hardware configurations and workloads.
We demonstrate that the proposed statistical modeling techniques, while simple and scalable, predict power with less than 5% Mean Absolute Percent Error (MAPE) for more than 95% diverse Power Distribution Units (more than 2000) using only 4 features.
arXiv Detail & Related papers (2021-03-22T21:22:51Z) - Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G
Networks [84.2155885234293]
We first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC.
To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC.
arXiv Detail & Related papers (2020-02-22T14:38:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.