Intelligent Proactive Fault Tolerance at the Edge through Resource Usage
Prediction
- URL: http://arxiv.org/abs/2302.05336v1
- Date: Thu, 9 Feb 2023 00:42:34 GMT
- Title: Intelligent Proactive Fault Tolerance at the Edge through Resource Usage
Prediction
- Authors: Theodoros Theodoropoulos, John Violos, Stylianos Tsanakas, Aris
Leivadeas, Konstantinos Tserpes, Theodora Varvarigou
- Abstract summary: We propose an Intelligent Proactive Fault Tolerance (IPFT) method that leverages the edge resource usage predictions through Recurrent Neural Networks (RNN)
In this paper, we focus on the process-faults, which are related with the inability of the infrastructure to provide Quality of Service (QoS) in acceptable ranges due to the lack of processing power.
- Score: 0.7046417074932255
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The proliferation of demanding applications and edge computing establishes
the need for an efficient management of the underlying computing
infrastructures, urging the providers to rethink their operational methods. In
this paper, we propose an Intelligent Proactive Fault Tolerance (IPFT) method
that leverages the edge resource usage predictions through Recurrent Neural
Networks (RNN). More specifically, we focus on the process-faults, which are
related with the inability of the infrastructure to provide Quality of Service
(QoS) in acceptable ranges due to the lack of processing power. In order to
tackle this challenge we propose a composite deep learning architecture that
predicts the resource usage metrics of the edge nodes and triggers proactive
node replications and task migration. Taking also into consideration that the
edge computing infrastructure is also highly dynamic and heterogeneous, we
propose an innovative Hybrid Bayesian Evolution Strategy (HBES) algorithm for
automated adaptation of the resource usage models. The proposed resource usage
prediction mechanism has been experimentally evaluated and compared with other
state of the art methods with significant improvements in terms of Root Mean
Squared Error (RMSE) and Mean Absolute Error (MAE). Additionally, the IPFT
mechanism that leverages the resource usage predictions has been evaluated in
an extensive simulation in CloudSim Plus and the results show significant
improvement compared to the reactive fault tolerance method in terms of
reliability and maintainability.
Related papers
- Causal Context Adjustment Loss for Learned Image Compression [72.7300229848778]
In recent years, learned image compression (LIC) technologies have surpassed conventional methods notably in terms of rate-distortion (RD) performance.
Most present techniques are VAE-based with an autoregressive entropy model, which obviously promotes the RD performance by utilizing the decoded causal context.
In this paper, we make the first attempt in investigating the way to explicitly adjust the causal context with our proposed Causal Context Adjustment loss.
arXiv Detail & Related papers (2024-10-07T09:08:32Z) - Online Client Scheduling and Resource Allocation for Efficient Federated Edge Learning [9.451084740123198]
Federated learning (FL) enables edge devices to collaboratively train a machine learning model without sharing their raw data.
However, deploying FL over mobile edge networks with constrained resources such as power, bandwidth, and suffers from high training latency and low model accuracy.
This paper investigates the optimal client scheduling and resource allocation for FL over mobile edge networks under resource constraints and uncertainty.
arXiv Detail & Related papers (2024-09-29T01:56:45Z) - Adaptive Anomaly Detection in Network Flows with Low-Rank Tensor Decompositions and Deep Unrolling [9.20186865054847]
Anomaly detection (AD) is increasingly recognized as a key component for ensuring the resilience of future communication systems.
This work considers AD in network flows using incomplete measurements.
We propose a novel block-successive convex approximation algorithm based on a regularized model-fitting objective.
Inspired by Bayesian approaches, we extend the model architecture to perform online adaptation to per-flow and per-time-step statistics.
arXiv Detail & Related papers (2024-09-17T19:59:57Z) - DNN Partitioning, Task Offloading, and Resource Allocation in Dynamic Vehicular Networks: A Lyapunov-Guided Diffusion-Based Reinforcement Learning Approach [49.56404236394601]
We formulate the problem of joint DNN partitioning, task offloading, and resource allocation in Vehicular Edge Computing.
Our objective is to minimize the DNN-based task completion time while guaranteeing the system stability over time.
We propose a Multi-Agent Diffusion-based Deep Reinforcement Learning (MAD2RL) algorithm, incorporating the innovative use of diffusion models.
arXiv Detail & Related papers (2024-06-11T06:31:03Z) - Enhancing Reliability of Neural Networks at the Edge: Inverted
Normalization with Stochastic Affine Transformations [0.22499166814992438]
We propose a method to inherently enhance the robustness and inference accuracy of BayNNs deployed in in-memory computing architectures.
Empirical results show a graceful degradation in inference accuracy, with an improvement of up to $58.11%$.
arXiv Detail & Related papers (2024-01-23T00:27:31Z) - Hybrid Reinforcement Learning for Optimizing Pump Sustainability in
Real-World Water Distribution Networks [55.591662978280894]
This article addresses the pump-scheduling optimization problem to enhance real-time control of real-world water distribution networks (WDNs)
Our primary objectives are to adhere to physical operational constraints while reducing energy consumption and operational costs.
Traditional optimization techniques, such as evolution-based and genetic algorithms, often fall short due to their lack of convergence guarantees.
arXiv Detail & Related papers (2023-10-13T21:26:16Z) - Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem.
We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z) - Evolutionary Optimization for Proactive and Dynamic Computing Resource
Allocation in Open Radio Access Network [4.9711284100869815]
Intelligent techniques are urged to achieve automatic allocation of the computing resource in Open Radio Access Network (O-RAN)
Existing problem formulation to solve this resource allocation problem is unsuitable as it defines the capacity utility of resource in an inappropriate way.
New formulation that better describes the problem is proposed.
arXiv Detail & Related papers (2022-01-12T08:52:04Z) - Reconfigurable Intelligent Surface Assisted Mobile Edge Computing with
Heterogeneous Learning Tasks [53.1636151439562]
Mobile edge computing (MEC) provides a natural platform for AI applications.
We present an infrastructure to perform machine learning tasks at an MEC with the assistance of a reconfigurable intelligent surface (RIS)
Specifically, we minimize the learning error of all participating users by jointly optimizing transmit power of mobile users, beamforming vectors of the base station, and the phase-shift matrix of the RIS.
arXiv Detail & Related papers (2020-12-25T07:08:50Z) - Resource Allocation via Model-Free Deep Learning in Free Space Optical
Communications [119.81868223344173]
The paper investigates the general problem of resource allocation for mitigating channel fading effects in Free Space Optical (FSO) communications.
Under this framework, we propose two algorithms that solve FSO resource allocation problems.
arXiv Detail & Related papers (2020-07-27T17:38:51Z) - Deep Learning-based Resource Allocation for Infrastructure Resilience [0.5249805590164901]
Decision-makers can use our trained models to allocate resources more efficiently after contingencies.
We showcase our methodology by the real-world interdependent infrastructure of Shelby County, TN.
arXiv Detail & Related papers (2020-07-12T00:48:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.