Related papers: Design and Implementation of an Automated Disaster-recovery System for a Kubernetes Cluster Using LSTM

Design and Implementation of an Automated Disaster-recovery System for a Kubernetes Cluster Using LSTM

URL: http://arxiv.org/abs/2402.02938v1
Date: Mon, 5 Feb 2024 12:00:31 GMT
Title: Design and Implementation of an Automated Disaster-recovery System for a Kubernetes Cluster Using LSTM
Authors: Ji-Beom Kim, Je-Bum Choi, and Eun-Sung Jung
Abstract summary: This study introduces a system structure that integrates management plat-forms with backup and restoration tools. The experimental results show that this system executes the restoration process within 15 s without human intervention, enabling rapid recovery.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: With the increasing importance of data in the modern business environment, effective data man-agement and protection strategies are gaining increasing research attention. Data protection in a cloud environment is crucial for safeguarding information assets and maintaining sustainable services. This study introduces a system structure that integrates Kubernetes management plat-forms with backup and restoration tools. This system is designed to immediately detect disasters and automatically recover applications from another kubernetes cluster. The experimental results show that this system executes the restoration process within 15 s without human intervention, enabling rapid recovery. This, in turn, significantly reduces the potential for delays and errors compared with manual recovery processes, thereby enhancing data management and recovery ef-ficiency in cloud environments. Moreover, our research model predicts the CPU utilization of the cluster using Long Short-Term Memory (LSTM). The necessity of scheduling through this predict is made clearer through comparison with experiments without scheduling, demonstrating its ability to prevent performance degradation. This research highlights the efficiency and necessity of automatic recovery systems in cloud environments, setting a new direction for future research.

Related papers

Edge-Cloud Collaborative Computing on Distributed Intelligence and Model Optimization: A Survey [59.52058740470727]
Edge-cloud collaborative computing (ECCC) has emerged as a pivotal paradigm for addressing the computational demands of modern intelligent applications.<n>Recent advancements in AI, particularly deep learning and large language models (LLMs), have dramatically enhanced the capabilities of these distributed systems.<n>This survey provides a structured tutorial on fundamental architectures, enabling technologies, and emerging applications.
arXiv Detail & Related papers (2025-05-03T13:55:38Z)
Adaptive Fault Tolerance Mechanisms of Large Language Models in Cloud Computing Environments [5.853391005435494]
This study proposes a novel adaptive fault tolerance mechanism to ensure the reliability and availability of large-scale language models in cloud computing scenarios. It builds upon known fault-tolerant mechanisms, such as checkpointing, redundancy, and state transposition, introducing dynamic resource allocation and prediction of failure based on real-time performance metrics.
arXiv Detail & Related papers (2025-03-15T18:45:33Z)
A Comprehensive Benchmarking Analysis of Fault Recovery in Stream Processing Frameworks [1.3398445165628463]
This paper provides a comprehensive analysis of fault recovery performance, stability, and recovery time in a cloud-native environment. Our results indicate that Flink is the most stable and has one of the best fault recovery. K Kafka Streams shows suitable fault recovery performance and stability, but with higher event latency.
arXiv Detail & Related papers (2024-04-09T10:49:23Z)
Comprehensive Study Of Predictive Maintenance In Industries Using Classification Models And LSTM Model [0.0]
The study aims to delve into various machine learning classification techniques, including Support Vector Machine (SVM), Random Forest, Logistic Regression, and Convolutional Neural Network LSTM-Based, for predicting and analyzing machine performance. The primary objective of the study is to assess these algorithms' performance in predicting and analyzing machine performance, considering factors such as accuracy, precision, recall, and F1 score.
arXiv Detail & Related papers (2024-03-15T12:47:45Z)
EASRec: Elastic Architecture Search for Efficient Long-term Sequential Recommender Systems [82.76483989905961]
Current Sequential Recommender Systems (SRSs) suffer from computational and resource inefficiencies. We develop the Elastic Architecture Search for Efficient Long-term Sequential Recommender Systems (EASRec) EASRec introduces data-aware gates that leverage historical information from input data batch to improve the performance of the recommendation network.
arXiv Detail & Related papers (2024-02-01T07:22:52Z)
Guarding the Grid: Enhancing Resilience in Automated Residential Demand Response Against False Data Injection Attacks [2.981139602986498]
Utility companies are increasingly leveraging residential demand flexibility and the proliferation of smart/IoT devices to enhance the effectiveness of demand response programs. The adoption of distributed architectures in these systems exposes them to the risk of false data injection attacks (FDIAs) We present a comprehensive framework that combines DR optimisation, anomaly detection, and strategies for mitigating the impacts of attacks to create a resilient and automated device scheduling system.
arXiv Detail & Related papers (2023-12-14T04:02:52Z)
TranDRL: A Transformer-Driven Deep Reinforcement Learning Enabled Prescriptive Maintenance Framework [58.474610046294856]
Industrial systems demand reliable predictive maintenance strategies to enhance operational efficiency and reduce downtime. This paper introduces an integrated framework that leverages the capabilities of the Transformer model-based neural networks and deep reinforcement learning (DRL) algorithms to optimize system maintenance actions.
arXiv Detail & Related papers (2023-09-29T02:27:54Z)
Understanding Container-based Services under Software Aging: Dependability and Performance Views [5.2135218089240185]
We show the optimal con-tainer-migration trigger intervals that can maximize the de-pendability or minimize the performance of a container-based service. This paper proposes a comprehensive semi-Markov-based approach to quantitatively evaluate the effect of OS reju-venation on the dependability and the performance of a con-tainer-based service.
arXiv Detail & Related papers (2023-08-24T13:40:26Z)
Fast Machine Unlearning Without Retraining Through Selective Synaptic Dampening [51.34904967046097]
Selective Synaptic Dampening (SSD) is a fast, performant, and does not require long-term storage of the training data. We present a novel two-step, post hoc, retrain-free approach to machine unlearning which is fast, performant, and does not require long-term storage of the training data.
arXiv Detail & Related papers (2023-08-15T11:30:45Z)
Re-thinking Data Availablity Attacks Against Deep Neural Networks [53.64624167867274]
In this paper, we re-examine the concept of unlearnable examples and discern that the existing robust error-minimizing noise presents an inaccurate optimization objective. We introduce a novel optimization paradigm that yields improved protection results with reduced computational time requirements.
arXiv Detail & Related papers (2023-05-18T04:03:51Z)
Unsupervised Restoration of Weather-affected Images using Deep Gaussian Process-based CycleGAN [92.15895515035795]
We describe an approach for supervising deep networks that are based on CycleGAN. We introduce new losses for training CycleGAN that lead to more effective training, resulting in high-quality reconstructions. We demonstrate that the proposed method can be effectively applied to different restoration tasks like de-raining, de-hazing and de-snowing.
arXiv Detail & Related papers (2022-04-23T01:30:47Z)
Predictive Maintenance for Edge-Based Sensor Networks: A Deep Reinforcement Learning Approach [68.40429597811071]
The risk of unplanned equipment downtime can be minimized through Predictive Maintenance of revenue generating assets. A model-free Deep Reinforcement Learning algorithm is proposed for predictive equipment maintenance from an equipment-based sensor network context. Unlike traditional black-box regression models, the proposed algorithm self-learns an optimal maintenance policy and provides actionable recommendation for each equipment.
arXiv Detail & Related papers (2020-07-07T10:00:32Z)
Hyperparameter Optimization in Neural Networks via Structured Sparse Recovery [54.60327265077322]
We study two important problems in the automated design of neural networks through the lens of sparse recovery methods. In the first part of this paper, we establish a novel connection between HPO and structured sparse recovery. In the second part of this paper, we establish a connection between NAS and structured sparse recovery.
arXiv Detail & Related papers (2020-07-07T00:57:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.