Related papers: QoS-Aware Power Minimization of Distributed Many-Core Servers using Transfer Q-Learning

QoS-Aware Power Minimization of Distributed Many-Core Servers using Transfer Q-Learning

URL: http://arxiv.org/abs/2102.01348v1
Date: Tue, 2 Feb 2021 06:47:58 GMT
Title: QoS-Aware Power Minimization of Distributed Many-Core Servers using Transfer Q-Learning
Authors: Dainius Jenkus, Fei Xia, Rishad Shafik, Alex Yakovlev
Abstract summary: This paper presents a runtime-aware controller using horizontal scaling (node allocation) and vertical scaling (resource allocation within nodes) A horizontal scaling determines the number of active nodes based on workload demands and the required scalable according to a set of rules. Then, it is coupled with vertical scaling using transfer Q-learning, which tunes power/performance based on workload profile using dynamic voltage/frequency scaling (DVFS) When combined, these methods allow to reduce the exploration time and violations when compared to model-free Q-learning.
Score: 8.123268089072523
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Web servers scaled across distributed systems necessitate complex runtime controls for providing quality of service (QoS) guarantees as well as minimizing the energy costs under dynamic workloads. This paper presents a QoS-aware runtime controller using horizontal scaling (node allocation) and vertical scaling (resource allocation within nodes) methods synergistically to provide adaptation to workloads while minimizing the power consumption under QoS constraint (i.e., response time). A horizontal scaling determines the number of active nodes based on workload demands and the required QoS according to a set of rules. Then, it is coupled with vertical scaling using transfer Q-learning, which further tunes power/performance based on workload profile using dynamic voltage/frequency scaling (DVFS). It transfers Q-values within minimally explored states reducing exploration requirements. In addition, the approach exploits a scalable architecture of the many-core server allowing to reuse available knowledge from fully or partially explored nodes. When combined, these methods allow to reduce the exploration time and QoS violations when compared to model-free Q-learning. The technique balances design-time and runtime costs to maximize the portability and operational optimality demonstrated through persistent power reductions with minimal QoS violations under different workload scenarios on heterogeneous multi-processing nodes of a server cluster.

Related papers

Benchmarking Dynamic SLO Compliance in Distributed Computing Continuum Systems [9.820223170841219]
Service Level Objectives (SLOs) in large-scale architectures are challenging due to their heterogeneous nature and varying service requirements. We present a benchmark of Active Inference -- an emerging method from neuroscience -- against three established reinforcement learning algorithms. We find that Active Inference is a promising approach for ensuring SLO compliance in DCCS, offering lower memory usage, stable CPU utilization, and fast convergence.
arXiv Detail & Related papers (2025-03-05T08:56:26Z)
AdaLog: Post-Training Quantization for Vision Transformers with Adaptive Logarithm Quantizer [54.713778961605115]
Vision Transformer (ViT) has become one of the most prevailing fundamental backbone networks in the computer vision community. We propose a novel non-uniform quantizer, dubbed the Adaptive Logarithm AdaLog (AdaLog) quantizer.
arXiv Detail & Related papers (2024-07-17T18:38:48Z)
Elastic Entangled Pair and Qubit Resource Management in Quantum Cloud Computing [73.7522199491117]
Quantum cloud computing (QCC) offers a promising approach to efficiently provide quantum computing resources. The fluctuations in user demand and quantum circuit requirements are challenging for efficient resource provisioning. We propose a resource allocation model to provision quantum computing and networking resources.
arXiv Detail & Related papers (2023-07-25T00:38:46Z)
Generalizable Resource Scaling of 5G Slices using Constrained Reinforcement Learning [2.0024258465343268]
Network slicing is a key enabler for 5G to support various applications. It is imperative that the 5G infrastructure provider (InP) allocates the right amount of resources depending on the slice's traffic.
arXiv Detail & Related papers (2023-06-15T17:16:34Z)
Matching Game for Optimized Association in Quantum Communication Networks [65.16483325184237]
This paper proposes a swap-stable request-QS association algorithm for quantum switches. It achieves a near-optimal (within 5%) performance in terms of the percentage of served requests. It is shown to be scalable and maintain its near-optimal performance even when the size of the QCN increases.
arXiv Detail & Related papers (2023-05-22T03:39:18Z)
Adaptive Federated Pruning in Hierarchical Wireless Networks [69.6417645730093]
Federated Learning (FL) is a privacy-preserving distributed learning framework where a server aggregates models updated by multiple devices without accessing their private datasets. In this paper, we introduce model pruning for HFL in wireless networks to reduce the neural network scale. We show that our proposed HFL with model pruning achieves similar learning accuracy compared with the HFL without model pruning and reduces about 50 percent communication cost.
arXiv Detail & Related papers (2023-05-15T22:04:49Z)
Scaling Limits of Quantum Repeater Networks [62.75241407271626]
Quantum networks (QNs) are a promising platform for secure communications, enhanced sensing, and efficient distributed quantum computing. Due to the fragile nature of quantum states, these networks face significant challenges in terms of scalability. In this paper, the scaling limits of quantum repeater networks (QRNs) are analyzed.
arXiv Detail & Related papers (2023-05-15T14:57:01Z)
Monitoring and Proactive Management of QoS Levels in Pervasive Applications [9.289846887298852]
Edge Computing (EC) provides multiple computation and analytics capabilities close to data sources. The expectation of ensuring high levels of execution imposes strict requirements for innovative management approaches. We elaborate a distributed and intelligent decision-making approach for tasks scheduling. We propose that nodes continuously monitor levels and systematically evaluate the probability of violating them to proactively decide some tasks to be offloaded to peer nodes or Cloud.
arXiv Detail & Related papers (2022-06-11T09:27:47Z)
MCDS: AI Augmented Workflow Scheduling in Mobile Edge Cloud Computing Systems [12.215537834860699]
Recently proposed scheduling methods leverage the low response times of edge computing platforms to optimize application Quality of Service (QoS) We propose MCDS: Monte Carlo Learning using Deep Surrogate Models to efficiently schedule workflow applications in mobile edge-cloud computing systems.
arXiv Detail & Related papers (2021-12-14T10:00:01Z)
Accelerating variational quantum algorithms with multiple quantum processors [78.36566711543476]
Variational quantum algorithms (VQAs) have the potential of utilizing near-term quantum machines to gain certain computational advantages. Modern VQAs suffer from cumbersome computational overhead, hampered by the tradition of employing a solitary quantum processor to handle large data. Here we devise an efficient distributed optimization scheme, called QUDIO, to address this issue.
arXiv Detail & Related papers (2021-06-24T08:18:42Z)
AI-based Resource Allocation: Reinforcement Learning for Adaptive Auto-scaling in Serverless Environments [0.0]
Serverless computing has emerged as a compelling new paradigm of cloud computing models in recent years. A common approach among both commercial and open source serverless computing platforms is workload-based auto-scaling. In this paper we investigate the applicability of a reinforcement learning approach to request-based auto-scaling in a serverless framework.
arXiv Detail & Related papers (2020-05-29T06:18:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.