Related papers: Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks

Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks

URL: http://arxiv.org/abs/2201.06147v1
Date: Wed, 12 Jan 2022 15:09:10 GMT
Title: Data augmentation through multivariate scenario forecasting in Data Centers using Generative Adversarial Networks
Authors: Jaime P\'erez, Patricia Arroba and Jos\'e M. Moya
Abstract summary: The main challenge in achieving a global energy efficiency strategy based on Artificial Intelligence is that we need massive amounts of data to feed the algorithms. This paper proposes a time-series data augmentation methodology based on synthetic scenario forecasting within the Data Center. Our research will help to optimize the energy consumed in Data Centers, although the proposed methodology can be employed in any similar time-series-like problem.
Score: 0.18416014644193063
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The Cloud paradigm is at a critical point in which the existing energy-efficiency techniques are reaching a plateau, while the computing resources demand at Data Center facilities continues to increase exponentially. The main challenge in achieving a global energy efficiency strategy based on Artificial Intelligence is that we need massive amounts of data to feed the algorithms. Nowadays, any optimization strategy must begin with data. However, companies with access to these large amounts of data decide not to share them because it could compromise their security. This paper proposes a time-series data augmentation methodology based on synthetic scenario forecasting within the Data Center. For this purpose, we will implement a powerful generative algorithm: Generative Adversarial Networks (GANs). The use of GANs will allow us to handle multivariate data and data from different natures (e.g., categorical). On the other hand, adapting Data Centers' operational management to the occurrence of sporadic anomalies is complicated due to the reduced frequency of failures in the system. Therefore, we also propose a methodology to increase the generated data variability by introducing on-demand anomalies. We validated our approach using real data collected from an operating Data Center, successfully obtaining forecasts of random scenarios with several hours of prediction. Our research will help to optimize the energy consumed in Data Centers, although the proposed methodology can be employed in any similar time-series-like problem.

Related papers

Towards an Introspective Dynamic Model of Globally Distributed Computing Infrastructures [27.473508984130728]
Large-scale scientific collaborations generate petabytes of data, with volumes soon expected to reach exabytes.<n>To manage these computational and storage demands, centralized workflow and data management systems are implemented.<n>A significant obstacle in adopting more effective or AI-driven solutions is the absence of a quick and reliable introspective dynamic model.
arXiv Detail & Related papers (2025-06-24T12:42:36Z)
Unlocking the Value of Decentralized Data: A Federated Dual Learning Approach for Model Aggregation [20.023295646723312]
Federated Learning (FL) offers a promising alternative by enabling AI models to be trained on decentralized data. Existing FL approaches struggle to match the performance of centralized training due to challenges such as heterogeneous data distribution and communication delays. We propose a dual learning approach that leverages centralized data at the server to guide the merging of model updates from clients.
arXiv Detail & Related papers (2025-03-26T01:00:35Z)
Tackling Data Heterogeneity in Federated Time Series Forecasting [61.021413959988216]
Time series forecasting plays a critical role in various real-world applications, including energy consumption prediction, disease transmission monitoring, and weather forecasting. Most existing methods rely on a centralized training paradigm, where large amounts of data are collected from distributed devices to a central cloud server. We propose a novel framework, Fed-TREND, to address data heterogeneity by generating informative synthetic data as auxiliary knowledge carriers.
arXiv Detail & Related papers (2024-11-24T04:56:45Z)
A Distribution-Aware Flow-Matching for Generating Unstructured Data for Few-Shot Reinforcement Learning [1.0709300917082865]
We introduce a distribution-aware flow matching, designed to generate synthetic unstructured data tailored for few-shot reinforcement learning (RL) on embedded processors. We apply feature weighting through Random Forests to prioritize critical data aspects, thereby improving the precision of the generated synthetic data. Our method provides a stable convergence based on max Q-value while enhancing frame rate by 30% in the very beginning first timestamps.
arXiv Detail & Related papers (2024-09-21T15:50:59Z)
PeFAD: A Parameter-Efficient Federated Framework for Time Series Anomaly Detection [51.20479454379662]
We propose a. Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns. We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74%.
arXiv Detail & Related papers (2024-06-04T13:51:08Z)
Multi-Source Conformal Inference Under Distribution Shift [41.701790856201036]
We consider the problem of obtaining distribution-free prediction intervals for a target population, leveraging multiple potentially biased data sources. We derive the efficient influence functions for the quantiles of unobserved outcomes in the target and source populations. We propose a data-adaptive strategy to upweight informative data sources for efficiency gain and downweight non-informative data sources for bias reduction.
arXiv Detail & Related papers (2024-05-15T13:33:09Z)
Analysis and Optimization of Wireless Federated Learning with Data Heterogeneity [72.85248553787538]
This paper focuses on performance analysis and optimization for wireless FL, considering data heterogeneity, combined with wireless resource allocation. We formulate the loss function minimization problem, under constraints on long-term energy consumption and latency, and jointly optimize client scheduling, resource allocation, and the number of local training epochs (CRE) Experiments on real-world datasets demonstrate that the proposed algorithm outperforms other benchmarks in terms of the learning accuracy and energy consumption.
arXiv Detail & Related papers (2023-08-04T04:18:01Z)
A Dataset Fusion Algorithm for Generalised Anomaly Detection in Homogeneous Periodic Time Series Datasets [0.0]
"Dataset Fusion" is an algorithm for fusing periodic signals from multiple homogeneous datasets into a single dataset. The proposed approach significantly outperforms conventional training approaches with an Average F1 score of 0.879. Results show that using only 6.25% of the training data, translating to a 93.7% reduction in computational power, results in a mere 4.04% decrease in performance.
arXiv Detail & Related papers (2023-05-14T16:24:09Z)
Balancing Performance and Energy Consumption of Bagging Ensembles for the Classification of Data Streams in Edge Computing [9.801387036837871]
Edge Computing (EC) has emerged as an enabling factor for developing technologies like the Internet of Things (IoT) and 5G networks. This work investigates strategies for optimizing the performance and energy consumption of bagging ensembles to classify data streams.
arXiv Detail & Related papers (2022-01-17T04:12:18Z)
Convolutional generative adversarial imputation networks for spatio-temporal missing data in storm surge simulations [86.5302150777089]
Generative Adversarial Imputation Nets (GANs) and GAN-based techniques have attracted attention as unsupervised machine learning methods. We name our proposed method as Con Conval Generative Adversarial Imputation Nets (Conv-GAIN)
arXiv Detail & Related papers (2021-11-03T03:50:48Z)
Reinforcement Learning for Datacenter Congestion Control [50.225885814524304]
Successful congestion control algorithms can dramatically improve latency and overall network throughput. Until today, no such learning-based algorithms have shown practical potential in this domain. We devise an RL-based algorithm with the aim of generalizing to different configurations of real-world datacenter networks. We show that this scheme outperforms alternative popular RL approaches, and generalizes to scenarios that were not seen during training.
arXiv Detail & Related papers (2021-02-18T13:49:28Z)
A Federated Data-Driven Evolutionary Algorithm [10.609815608017065]
Existing data-driven evolutionary optimization algorithms require that all data are centrally stored. This paper proposes a federated data-driven evolutionary optimization framework that is able to perform data driven optimization when the data is distributed on multiple devices.
arXiv Detail & Related papers (2021-02-16T17:18:54Z)
Quasi-Global Momentum: Accelerating Decentralized Deep Learning on Heterogeneous Data [77.88594632644347]
Decentralized training of deep learning models is a key element for enabling data privacy and on-device learning over networks. In realistic learning scenarios, the presence of heterogeneity across different clients' local datasets poses an optimization challenge. We propose a novel momentum-based method to mitigate this decentralized training difficulty.
arXiv Detail & Related papers (2021-02-09T11:27:14Z)

This list is automatically generated from the titles and abstracts of the papers in this site.