Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers
- URL: http://arxiv.org/abs/2504.20105v1
- Date: Sun, 27 Apr 2025 11:56:48 GMT
- Title: Electricity Cost Minimization for Multi-Workflow Allocation in Geo-Distributed Data Centers
- Authors: Shuang Wang, He Zhang, Tianxing Wu, Yueyou Zhang, Wei Emma Zhang, Quan Z. Sheng,
- Abstract summary: Geo-distributed Data Centers (GDCs) provide computing and storage services for massive workflow applications.<n>How to reduce electricity costs while satisfying the deadline constraints of workflow applications is important in GDCs.
- Score: 26.008602849483594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Worldwide, Geo-distributed Data Centers (GDCs) provide computing and storage services for massive workflow applications, resulting in high electricity costs that vary depending on geographical locations and time. How to reduce electricity costs while satisfying the deadline constraints of workflow applications is important in GDCs, which is determined by the execution time of servers, power, and electricity price. Determining the completion time of workflows with different server frequencies can be challenging, especially in scenarios with heterogeneous computing resources in GDCs. Moreover, the electricity price is also different in geographical locations and may change dynamically. To address these challenges, we develop a geo-distributed system architecture and propose an Electricity Cost aware Multiple Workflows Scheduling algorithm (ECMWS) for servers of GDCs with fixed frequency and power. ECMWS comprises four stages, namely workflow sequencing, deadline partitioning, task sequencing, and resource allocation where two graph embedding models and a policy network are constructed to solve the Markov Decision Process (MDP). After statistically calibrating parameters and algorithm components over a comprehensive set of workflow instances, the proposed algorithms are compared with the state-of-the-art methods over two types of workflow instances. The experimental results demonstrate that our proposed algorithm significantly outperforms other algorithms, achieving an improvement of over 15\% while maintaining an acceptable computational time. The source codes are available at https://gitee.com/public-artifacts/ecmws-experiments.
Related papers
- Dynamic Operating System Scheduling Using Double DQN: A Reinforcement Learning Approach to Task Optimization [2.2045629562818085]
Experimental results show that the Double DQN algorithm has high scheduling performance under light load, medium load and heavy load scenarios.<n>The algorithm also shows high optimization ability in resource utilization and can intelligently adjust resource allocation according to the system state.<n>Future studies will explore the application of the algorithm in more complex systems, especially cloud computing and large-scale distributed environments.
arXiv Detail & Related papers (2025-03-31T01:48:21Z) - Benchmarking Agentic Workflow Generation [80.74757493266057]
We introduce WorfBench, a unified workflow generation benchmark with multi-faceted scenarios and intricate graph workflow structures.<n>We also present WorfEval, a systemic evaluation protocol utilizing subsequence and subgraph matching algorithms.<n>We observe that the generated can enhance downstream tasks, enabling them to achieve superior performance with less time during inference.
arXiv Detail & Related papers (2024-10-10T12:41:19Z) - Computation Rate Maximization for Wireless Powered Edge Computing With Multi-User Cooperation [10.268239987867453]
This study considers a wireless-powered mobile edge computing system that includes a hybrid access point equipped with a computing unit and multiple Internet of Things (IoT) devices.
We propose a novel muti-user cooperation scheme to improve computation performance, where collaborative clusters are dynamically formed.
Specifically, we aims to maximize the weighted sum computation rate (WSCR) of all the IoT devices in the network.
arXiv Detail & Related papers (2024-01-22T05:22:19Z) - Sustainable AIGC Workload Scheduling of Geo-Distributed Data Centers: A
Multi-Agent Reinforcement Learning Approach [48.18355658448509]
Recent breakthroughs in generative artificial intelligence have triggered a surge in demand for machine learning training, which poses significant cost burdens and environmental challenges due to its substantial energy consumption.
Scheduling training jobs among geographically distributed cloud data centers unveils the opportunity to optimize the usage of computing capacity powered by inexpensive and low-carbon energy.
We propose an algorithm based on multi-agent reinforcement learning and actor-critic methods to learn the optimal collaborative scheduling strategy through interacting with a cloud system built with real-life workload patterns, energy prices, and carbon intensities.
arXiv Detail & Related papers (2023-04-17T02:12:30Z) - A Makespan and Energy-Aware Scheduling Algorithm for Workflows under
Reliability Constraint on a Multiprocessor Platform [11.427019313284]
We propose a workflow scheduling algorithm to minimize the makespan and energy for a given reliability constraint.
We show that our algorithms, MERT and EAFTS, outperform the state-of-art approaches.
arXiv Detail & Related papers (2022-12-19T07:03:04Z) - Communication-Efficient Adam-Type Algorithms for Distributed Data Mining [93.50424502011626]
We propose a class of novel distributed Adam-type algorithms (emphi.e., SketchedAMSGrad) utilizing sketching.
Our new algorithm achieves a fast convergence rate of $O(frac1sqrtnT + frac1(k/d)2 T)$ with the communication cost of $O(k log(d))$ at each iteration.
arXiv Detail & Related papers (2022-10-14T01:42:05Z) - Semantic of Cloud Computing services for Time Series workflows [0.0]
Time series (TS) are present in many fields of knowledge, research, and engineering.
The processing and analysis of TS are essential in order to extract knowledge from the data and to tackle forecasting or predictive maintenance tasks.
The modeling of TS is a challenging task, requiring high statistical expertise as well as outstanding knowledge about the application of Data Mining(DM) and Machine Learning (ML) methods.
arXiv Detail & Related papers (2022-02-01T17:57:40Z) - Efficient Device Scheduling with Multi-Job Federated Learning [64.21733164243781]
We propose a novel multi-job Federated Learning framework to enable the parallel training process of multiple jobs.
We propose a reinforcement learning-based method and a Bayesian optimization-based method to schedule devices for multiple jobs while minimizing the cost.
Our proposed approaches significantly outperform baseline approaches in terms of training time (up to 8.67 times faster) and accuracy (up to 44.6% higher)
arXiv Detail & Related papers (2021-12-11T08:05:11Z) - Dynamic Network-Assisted D2D-Aided Coded Distributed Learning [59.29409589861241]
We propose a novel device-to-device (D2D)-aided coded federated learning method (D2D-CFL) for load balancing across devices.
We derive an optimal compression rate for achieving minimum processing time and establish its connection with the convergence time.
Our proposed method is beneficial for real-time collaborative applications, where the users continuously generate training data.
arXiv Detail & Related papers (2021-11-26T18:44:59Z) - Rosella: A Self-Driving Distributed Scheduler for Heterogeneous Clusters [7.206919625027208]
We present Rosella, a new self-driving, distributed approach for task scheduling in heterogeneous clusters.
Rosella automatically learns the compute environment and adjusts its scheduling policy in real-time.
We evaluate Rosella with a variety of workloads on a 32-node AWS cluster.
arXiv Detail & Related papers (2020-10-28T20:12:29Z) - A Machine Learning Approach for Task and Resource Allocation in Mobile
Edge Computing Based Networks [108.57859531628264]
A joint task, spectrum, and transmit power allocation problem is investigated for a wireless network.
The proposed algorithm can reduce the number of iterations needed for convergence and the maximal delay among all users by up to 18% and 11.1% compared to the standard Q-learning algorithm.
arXiv Detail & Related papers (2020-07-20T13:46:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.