A Roadmap towards Intelligent Operations for Reliable Cloud Computing
Systems
- URL: http://arxiv.org/abs/2310.00677v1
- Date: Sun, 1 Oct 2023 14:08:02 GMT
- Title: A Roadmap towards Intelligent Operations for Reliable Cloud Computing
Systems
- Authors: Yintong Huo, Cheryl Lee, Jinyang Liu, Tianyi Yang, and Michael R. Lyu
- Abstract summary: This paper highlights two main challenges, namely internal and external factors, that affect the reliability of cloud.
We discuss the data-driven approach that can resolve these challenges from four key aspects: ticket management, log management, multimodal analysis, and the microservice resilience testing approach.
- Score: 30.952201576129056
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The increasing complexity and usage of cloud systems have made it challenging
for service providers to ensure reliability. This paper highlights two main
challenges, namely internal and external factors, that affect the reliability
of cloud microservices. Afterward, we discuss the data-driven approach that can
resolve these challenges from four key aspects: ticket management, log
management, multimodal analysis, and the microservice resilience testing
approach. The experiments conducted show that the proposed data-driven AIOps
solution significantly enhances system reliability from multiple angles.
Related papers
- CoRe-MMRAG: Cross-Source Knowledge Reconciliation for Multimodal RAG [53.950029990391066]
Cross-source knowledge textbfReconciliation for Multimodal RAG (CoRe-MMRAG)<n>We propose a novel end-to-end framework that effectively reconciles inconsistencies across knowledge sources.<n>Experiments on KB-VQA benchmarks show that CoRe-MMRAG achieves substantial improvements over baseline methods.
arXiv Detail & Related papers (2025-06-03T07:32:40Z) - Unlocking the Potential of Difficulty Prior in RL-based Multimodal Reasoning [69.64809103333839]
We investigate how explicitly modeling problem's difficulty prior information shapes the effectiveness of reinforcement learning based fine-tuning for multimodal reasoning.<n>Our approach demonstrates significant performances across various multi-modal mathematical reasoning benchmarks with only 2K+0.6K two-stage training data.
arXiv Detail & Related papers (2025-05-19T15:43:10Z) - Opportunistic Collaborative Planning with Large Vision Model Guided Control and Joint Query-Service Optimization [74.92515821144484]
Navigating autonomous vehicles in open scenarios is a challenge due to the difficulties in handling unseen objects.
Existing solutions either rely on small models that struggle with generalization or large models that are resource-intensive.
This paper proposes opportunistic collaborative planning (OCP), which seamlessly integrates efficient local models with powerful cloud models.
arXiv Detail & Related papers (2025-04-25T04:07:21Z) - Anomaly Detection in Large-Scale Cloud Systems: An Industry Case and Dataset [1.293050392312921]
We introduce a new high-dimensional dataset from IBM Cloud, collected over 4.5 months from the IBM Cloud Console.
This dataset comprises 39,365 rows and 117,448 columns of telemetry data.
We demonstrate the application of machine learning models for anomaly detection and discuss the key challenges faced in this process.
arXiv Detail & Related papers (2024-11-13T22:04:19Z) - Towards Resource-Efficient Federated Learning in Industrial IoT for Multivariate Time Series Analysis [50.18156030818883]
Anomaly and missing data constitute a thorny problem in industrial applications.
Deep learning enabled anomaly detection has emerged as a critical direction.
The data collected in edge devices contain user privacy.
arXiv Detail & Related papers (2024-11-06T15:38:31Z) - Blockchain-Based Trust and Transparency in Airline Reservation Systems using Microservices Architecture [1.03590082373586]
The study investigates the major components of blockchain technology such as decentralised databases, permanent records of transactions and transactional clauses executed via codes of programs.
The results show a 30% decrease in booking variations together with greater data synchronization as a result of consensus processes and resistant data formations.
The architecture of the system has no single point failure with over 98% reliability while measures taken to improve security have led to 85% of the customers expressing trust in the services provided.
arXiv Detail & Related papers (2024-10-18T14:58:22Z) - Industry Perception of Security Challenges with Identity Access Management Solutions [0.0]
The study aims to outline the current perception and security issues associated with IAMs solutions from the perspective of the beneficiaries.
The main challenges for cloud based IAM solutions were Default configurations, Poor management of Non-Human Identities such as Service accounts, Poor certificate management, Poor API configuration and limited Log analysis.
In contrast, the challenges for on premise solutions were Multi Factor Authentication, insecure Default configurations, Lack of skillsets required to manage IAM solution securely, Poor password policies, Unpatched vulnerabilities, and compromise of Single-Sign on leading to compromise of multiple entities.
arXiv Detail & Related papers (2024-08-20T08:19:58Z) - Insights on Microservice Architecture Through the Eyes of Industry Practitioners [39.58317527488534]
The adoption of microservice architecture has seen a considerable upswing in recent years.
This study investigates the motivations, activities, and challenges associated with migrating from monolithic legacy systems.
arXiv Detail & Related papers (2024-08-19T21:56:58Z) - A Comprehensive Survey on Root Cause Analysis in (Micro) Services: Methodologies, Challenges, and Trends [12.814440316872748]
This survey aims to provide a comprehensive, structured review of root cause analysis (RCA) techniques.
It explores methodologies that include metrics, traces, logs, and multi-model data.
arXiv Detail & Related papers (2024-07-23T11:02:49Z) - Effective Intrusion Detection in Heterogeneous Internet-of-Things Networks via Ensemble Knowledge Distillation-based Federated Learning [52.6706505729803]
We introduce Federated Learning (FL) to collaboratively train a decentralized shared model of Intrusion Detection Systems (IDS)
FLEKD enables a more flexible aggregation method than conventional model fusion techniques.
Experiment results show that the proposed approach outperforms local training and traditional FL in terms of both speed and performance.
arXiv Detail & Related papers (2024-01-22T14:16:37Z) - Edge Intelligence Over the Air: Two Faces of Interference in Federated
Learning [95.31679010587473]
Federated edge learning is envisioned as the bedrock of enabling intelligence in next-generation wireless networks.
This article provides a comprehensive overview of the positive and negative effects of interference on over-the-air-based edge learning systems.
arXiv Detail & Related papers (2023-06-17T09:04:48Z) - MMRNet: Improving Reliability for Multimodal Object Detection and
Segmentation for Bin Picking via Multimodal Redundancy [68.7563053122698]
We propose a reliable object detection and segmentation system with MultiModal Redundancy (MMRNet)
This is the first system that introduces the concept of multimodal redundancy to address sensor failure issues during deployment.
We present a new label-free multi-modal consistency (MC) score that utilizes the output from all modalities to measure the overall system output reliability and uncertainty.
arXiv Detail & Related papers (2022-10-19T19:15:07Z) - Distributed intelligence on the Edge-to-Cloud Continuum: A systematic
literature review [62.997667081978825]
This review aims at providing a comprehensive vision of the main state-of-the-art libraries and frameworks for machine learning and data analytics available today.
The main simulation, emulation, deployment systems, and testbeds for experimental research on the Edge-to-Cloud Continuum available today are also surveyed.
arXiv Detail & Related papers (2022-04-29T08:06:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.