Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis
- URL: http://arxiv.org/abs/2407.01710v2
- Date: Tue, 14 Jan 2025 05:49:10 GMT
- Title: Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis
- Authors: Shenglin Zhang, Sibo Xia, Wenzhao Fan, Binpeng Shi, Xiao Xiong, Zhenyu Zhong, Minghua Ma, Yongqian Sun, Dan Pei,
- Abstract summary: This survey provides an exhaustive review of 98 scientific papers from 2003 to the present.
It includes a thorough examination and elucidation of the fundamental concepts, system architecture, and problem statement.
It also includes a qualitative analysis of the dimensions, providing an in-depth discussion of current best practices and future directions.
- Score: 10.92325792850306
- License:
- Abstract: Widely adopted for their scalability and flexibility, modern microservice systems present unique failure diagnosis challenges due to their independent deployment and dynamic interactions. This complexity can lead to cascading failures that negatively impact operational efficiency and user experience. Recognizing the critical role of fault diagnosis in improving the stability and reliability of microservice systems, researchers have conducted extensive studies and achieved a number of significant results. This survey provides an exhaustive review of 98 scientific papers from 2003 to the present, including a thorough examination and elucidation of the fundamental concepts, system architecture, and problem statement. It also includes a qualitative analysis of the dimensions, providing an in-depth discussion of current best practices and future directions, aiming to further its development and application. In addition, this survey compiles publicly available datasets, toolkits, and evaluation metrics to facilitate the selection and validation of techniques for practitioners.
Related papers
- Addressing Challenges in Data Quality and Model Generalization for Malaria Detection [0.0]
Malaria remains a significant global health burden, particularly in resource-limited regions where timely and accurate diagnosis is critical to effective treatment and control.
Deep Learning (DL) has emerged as a transformative tool for automating malaria detection and it offers high accuracy and scalability.
However, the effectiveness of these models is constrained by challenges in data quality and model generalization.
This article provides a comprehensive analysis of these challenges and their implications for malaria detection performance.
arXiv Detail & Related papers (2024-12-31T14:25:55Z) - Design-Reality Gap Analysis of Health Information Systems Failure [0.0]
This study investigates the factors contributing to the failure of Health Information Systems in a public hospital in South Africa.
Findings highlight several factors contributing to HIS failures, including system capacity constraints, inadequate IT risk management, and critical skills gaps.
This study underscores the importance of addressing design-reality gaps to improve HIS outcomes in public healthcare settings.
arXiv Detail & Related papers (2024-11-05T15:31:40Z) - Beyond One-Time Validation: A Framework for Adaptive Validation of Prognostic and Diagnostic AI-based Medical Devices [55.319842359034546]
Existing approaches often fall short in addressing the complexity of practically deploying these devices.
The presented framework emphasizes the importance of repeating validation and fine-tuning during deployment.
It is positioned within the current US and EU regulatory landscapes.
arXiv Detail & Related papers (2024-09-07T11:13:52Z) - A Comprehensive Survey on Root Cause Analysis in (Micro) Services: Methodologies, Challenges, and Trends [12.814440316872748]
This survey aims to provide a comprehensive, structured review of root cause analysis (RCA) techniques.
It explores methodologies that include metrics, traces, logs, and multi-model data.
arXiv Detail & Related papers (2024-07-23T11:02:49Z) - Unified Uncertainty Estimation for Cognitive Diagnosis Models [70.46998436898205]
We propose a unified uncertainty estimation approach for a wide range of cognitive diagnosis models.
We decompose the uncertainty of diagnostic parameters into data aspect and model aspect.
Our method is effective and can provide useful insights into the uncertainty of cognitive diagnosis.
arXiv Detail & Related papers (2024-03-09T13:48:20Z) - Lessons Learned from EXMOS User Studies: A Technical Report Summarizing
Key Takeaways from User Studies Conducted to Evaluate The EXMOS Platform [5.132827811038276]
Two user studies aimed at illuminating the influence of different explanation types on three key dimensions: trust, understandability, and model improvement.
Results show that global model-centric explanations alone are insufficient for effectively guiding users during the intricate process of data configuration.
We present essential implications for developing interactive machine-learning systems driven by explanations.
arXiv Detail & Related papers (2023-10-03T14:04:45Z) - Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions.
We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z) - Understanding metric-related pitfalls in image analysis validation [59.15220116166561]
This work provides the first comprehensive common point of access to information on pitfalls related to validation metrics in image analysis.
Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy.
arXiv Detail & Related papers (2023-02-03T14:57:40Z) - A Domain-Agnostic Approach for Characterization of Lifelong Learning
Systems [128.63953314853327]
"Lifelong Learning" systems are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability.
We show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems.
arXiv Detail & Related papers (2023-01-18T21:58:54Z) - An Overview of Healthcare Data Analytics With Applications to the
COVID-19 Pandemic [20.912943922420407]
We describe how innovative analytical methods, machine learning tools and metaheuristics can tackle general healthcare problems.
In particular, we give applications of modern digital technology, statistical methods, data platforms and data integration systems.
We make the case that analyzing and interpreting big data is a very challenging task that requires a multi-disciplinary effort.
arXiv Detail & Related papers (2021-11-25T06:37:24Z) - Clinical Outcome Prediction from Admission Notes using Self-Supervised
Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks.
Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets.
We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.