Design and Implementation of an Automated Disaster-recovery System for a
Kubernetes Cluster Using LSTM
- URL: http://arxiv.org/abs/2402.02938v1
- Date: Mon, 5 Feb 2024 12:00:31 GMT
- Title: Design and Implementation of an Automated Disaster-recovery System for a
Kubernetes Cluster Using LSTM
- Authors: Ji-Beom Kim, Je-Bum Choi, and Eun-Sung Jung
- Abstract summary: This study introduces a system structure that integrates management plat-forms with backup and restoration tools.
The experimental results show that this system executes the restoration process within 15 s without human intervention, enabling rapid recovery.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: With the increasing importance of data in the modern business environment,
effective data man-agement and protection strategies are gaining increasing
research attention. Data protection in a cloud environment is crucial for
safeguarding information assets and maintaining sustainable services. This
study introduces a system structure that integrates Kubernetes management
plat-forms with backup and restoration tools. This system is designed to
immediately detect disasters and automatically recover applications from
another kubernetes cluster. The experimental results show that this system
executes the restoration process within 15 s without human intervention,
enabling rapid recovery. This, in turn, significantly reduces the potential for
delays and errors compared with manual recovery processes, thereby enhancing
data management and recovery ef-ficiency in cloud environments. Moreover, our
research model predicts the CPU utilization of the cluster using Long
Short-Term Memory (LSTM). The necessity of scheduling through this predict is
made clearer through comparison with experiments without scheduling,
demonstrating its ability to prevent performance degradation. This research
highlights the efficiency and necessity of automatic recovery systems in cloud
environments, setting a new direction for future research.
Related papers
- Adaptive Fault Tolerance Mechanisms of Large Language Models in Cloud Computing Environments [5.853391005435494]
This study proposes a novel adaptive fault tolerance mechanism to ensure the reliability and availability of large-scale language models in cloud computing scenarios.
It builds upon known fault-tolerant mechanisms, such as checkpointing, redundancy, and state transposition, introducing dynamic resource allocation and prediction of failure based on real-time performance metrics.
arXiv Detail & Related papers (2025-03-15T18:45:33Z) - A Comprehensive Benchmarking Analysis of Fault Recovery in Stream Processing Frameworks [1.3398445165628463]
This paper provides a comprehensive analysis of fault recovery performance, stability, and recovery time in a cloud-native environment.
Our results indicate that Flink is the most stable and has one of the best fault recovery.
K Kafka Streams shows suitable fault recovery performance and stability, but with higher event latency.
arXiv Detail & Related papers (2024-04-09T10:49:23Z) - Comprehensive Study Of Predictive Maintenance In Industries Using Classification Models And LSTM Model [0.0]
The study aims to delve into various machine learning classification techniques, including Support Vector Machine (SVM), Random Forest, Logistic Regression, and Convolutional Neural Network LSTM-Based, for predicting and analyzing machine performance.
The primary objective of the study is to assess these algorithms' performance in predicting and analyzing machine performance, considering factors such as accuracy, precision, recall, and F1 score.
arXiv Detail & Related papers (2024-03-15T12:47:45Z) - EASRec: Elastic Architecture Search for Efficient Long-term Sequential
Recommender Systems [82.76483989905961]
Current Sequential Recommender Systems (SRSs) suffer from computational and resource inefficiencies.
We develop the Elastic Architecture Search for Efficient Long-term Sequential Recommender Systems (EASRec)
EASRec introduces data-aware gates that leverage historical information from input data batch to improve the performance of the recommendation network.
arXiv Detail & Related papers (2024-02-01T07:22:52Z) - Guarding the Grid: Enhancing Resilience in Automated Residential Demand Response Against False Data Injection Attacks [2.981139602986498]
Utility companies are increasingly leveraging residential demand flexibility and the proliferation of smart/IoT devices to enhance the effectiveness of demand response programs.
The adoption of distributed architectures in these systems exposes them to the risk of false data injection attacks (FDIAs)
We present a comprehensive framework that combines DR optimisation, anomaly detection, and strategies for mitigating the impacts of attacks to create a resilient and automated device scheduling system.
arXiv Detail & Related papers (2023-12-14T04:02:52Z) - TranDRL: A Transformer-Driven Deep Reinforcement Learning Enabled Prescriptive Maintenance Framework [58.474610046294856]
Industrial systems demand reliable predictive maintenance strategies to enhance operational efficiency and reduce downtime.
This paper introduces an integrated framework that leverages the capabilities of the Transformer model-based neural networks and deep reinforcement learning (DRL) algorithms to optimize system maintenance actions.
arXiv Detail & Related papers (2023-09-29T02:27:54Z) - Understanding Container-based Services under Software Aging:
Dependability and Performance Views [5.2135218089240185]
We show the optimal con-tainer-migration trigger intervals that can maximize the de-pendability or minimize the performance of a container-based service.
This paper proposes a comprehensive semi-Markov-based approach to quantitatively evaluate the effect of OS reju-venation on the dependability and the performance of a con-tainer-based service.
arXiv Detail & Related papers (2023-08-24T13:40:26Z) - Fast Machine Unlearning Without Retraining Through Selective Synaptic
Dampening [51.34904967046097]
Selective Synaptic Dampening (SSD) is a fast, performant, and does not require long-term storage of the training data.
We present a novel two-step, post hoc, retrain-free approach to machine unlearning which is fast, performant, and does not require long-term storage of the training data.
arXiv Detail & Related papers (2023-08-15T11:30:45Z) - Re-thinking Data Availablity Attacks Against Deep Neural Networks [53.64624167867274]
In this paper, we re-examine the concept of unlearnable examples and discern that the existing robust error-minimizing noise presents an inaccurate optimization objective.
We introduce a novel optimization paradigm that yields improved protection results with reduced computational time requirements.
arXiv Detail & Related papers (2023-05-18T04:03:51Z) - Unsupervised Restoration of Weather-affected Images using Deep Gaussian
Process-based CycleGAN [92.15895515035795]
We describe an approach for supervising deep networks that are based on CycleGAN.
We introduce new losses for training CycleGAN that lead to more effective training, resulting in high-quality reconstructions.
We demonstrate that the proposed method can be effectively applied to different restoration tasks like de-raining, de-hazing and de-snowing.
arXiv Detail & Related papers (2022-04-23T01:30:47Z) - Predictive Maintenance for Edge-Based Sensor Networks: A Deep
Reinforcement Learning Approach [68.40429597811071]
The risk of unplanned equipment downtime can be minimized through Predictive Maintenance of revenue generating assets.
A model-free Deep Reinforcement Learning algorithm is proposed for predictive equipment maintenance from an equipment-based sensor network context.
Unlike traditional black-box regression models, the proposed algorithm self-learns an optimal maintenance policy and provides actionable recommendation for each equipment.
arXiv Detail & Related papers (2020-07-07T10:00:32Z) - Hyperparameter Optimization in Neural Networks via Structured Sparse
Recovery [54.60327265077322]
We study two important problems in the automated design of neural networks through the lens of sparse recovery methods.
In the first part of this paper, we establish a novel connection between HPO and structured sparse recovery.
In the second part of this paper, we establish a connection between NAS and structured sparse recovery.
arXiv Detail & Related papers (2020-07-07T00:57:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.