Related papers: Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis

Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis

URL: http://arxiv.org/abs/2407.01710v2
Date: Tue, 14 Jan 2025 05:49:10 GMT
Title: Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis
Authors: Shenglin Zhang, Sibo Xia, Wenzhao Fan, Binpeng Shi, Xiao Xiong, Zhenyu Zhong, Minghua Ma, Yongqian Sun, Dan Pei,
Abstract summary: This survey provides an exhaustive review of 98 scientific papers from 2003 to the present.<n>It includes a thorough examination and elucidation of the fundamental concepts, system architecture, and problem statement.<n>It also includes a qualitative analysis of the dimensions, providing an in-depth discussion of current best practices and future directions.
Score: 10.92325792850306
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Widely adopted for their scalability and flexibility, modern microservice systems present unique failure diagnosis challenges due to their independent deployment and dynamic interactions. This complexity can lead to cascading failures that negatively impact operational efficiency and user experience. Recognizing the critical role of fault diagnosis in improving the stability and reliability of microservice systems, researchers have conducted extensive studies and achieved a number of significant results. This survey provides an exhaustive review of 98 scientific papers from 2003 to the present, including a thorough examination and elucidation of the fundamental concepts, system architecture, and problem statement. It also includes a qualitative analysis of the dimensions, providing an in-depth discussion of current best practices and future directions, aiming to further its development and application. In addition, this survey compiles publicly available datasets, toolkits, and evaluation metrics to facilitate the selection and validation of techniques for practitioners.

Related papers

Medical Reasoning in the Era of LLMs: A Systematic Review of Enhancement Techniques and Applications [59.721265428780946]
Large Language Models (LLMs) in medicine have enabled impressive capabilities, yet a critical gap remains in their ability to perform systematic, transparent, and verifiable reasoning.<n>This paper provides the first systematic review of this emerging field.<n>We propose a taxonomy of reasoning enhancement techniques, categorized into training-time strategies and test-time mechanisms.
arXiv Detail & Related papers (2025-08-01T14:41:31Z)
Leveraging MIMIC Datasets for Better Digital Health: A Review on Open Problems, Progress Highlights, and Future Promises [15.072974454383925]
The Medical Information Mart for Intensive Care (MIMIC) datasets have become the Kernel of Digital Health Research.<n>We identify persistent issues such as data granularity, cardinality limitations, heterogeneous coding schemes, and ethical constraints that hinder the generalizability and real-time implementation of machine learning models.<n>This survey offers actionable insights to guide the next generation of MIMIC powered digital health innovations.
arXiv Detail & Related papers (2025-06-15T10:47:07Z)
Systematic Literature Review on Clinical Trial Eligibility Matching [0.24554686192257422]
Review highlights how explainable AI and standardized ontology can bolster clinician trust and broaden adoption. Further research into advanced semantic and temporal representations, expanded data integration, and rigorous prospective evaluations is necessary to fully realize the transformative potential of NLP in clinical trial recruitment.
arXiv Detail & Related papers (2025-03-02T11:45:50Z)
Addressing Challenges in Data Quality and Model Generalization for Malaria Detection [0.0]
Malaria remains a significant global health burden, particularly in resource-limited regions where timely and accurate diagnosis is critical to effective treatment and control. Deep Learning (DL) has emerged as a transformative tool for automating malaria detection and it offers high accuracy and scalability. However, the effectiveness of these models is constrained by challenges in data quality and model generalization. This article provides a comprehensive analysis of these challenges and their implications for malaria detection performance.
arXiv Detail & Related papers (2024-12-31T14:25:55Z)
Design-Reality Gap Analysis of Health Information Systems Failure [0.0]
This study investigates the factors contributing to the failure of Health Information Systems in a public hospital in South Africa. Findings highlight several factors contributing to HIS failures, including system capacity constraints, inadequate IT risk management, and critical skills gaps. This study underscores the importance of addressing design-reality gaps to improve HIS outcomes in public healthcare settings.
arXiv Detail & Related papers (2024-11-05T15:31:40Z)
Beyond One-Time Validation: A Framework for Adaptive Validation of Prognostic and Diagnostic AI-based Medical Devices [55.319842359034546]
Existing approaches often fall short in addressing the complexity of practically deploying these devices. The presented framework emphasizes the importance of repeating validation and fine-tuning during deployment. It is positioned within the current US and EU regulatory landscapes.
arXiv Detail & Related papers (2024-09-07T11:13:52Z)
TVDiag: A Task-oriented and View-invariant Failure Diagnosis Framework with Multimodal Data [11.373761837547852]
Microservice-based systems often suffer from reliability issues due to their intricate interactions and expanding scale. Traditional failure diagnosis methods that use single-modal data can hardly cover all failure scenarios due to the restricted information. We propose textitTVDiag, a multimodal failure diagnosis framework for locating culprit microservice instances and identifying their failure types.
arXiv Detail & Related papers (2024-07-29T05:26:57Z)
A Comprehensive Survey on Root Cause Analysis in (Micro) Services: Methodologies, Challenges, and Trends [12.814440316872748]
This survey aims to provide a comprehensive, structured review of root cause analysis (RCA) techniques. It explores methodologies that include metrics, traces, logs, and multi-model data.
arXiv Detail & Related papers (2024-07-23T11:02:49Z)
TrialBench: Multi-Modal Artificial Intelligence-Ready Clinical Trial Datasets [54.98321887435557]
This paper presents a suite of 23 meticulously curated AI-ready datasets covering multi-modal input features and 8 crucial prediction challenges in clinical trial design.<n>We provide basic validation methods for each task to ensure the datasets' usability and reliability.<n>We anticipate that the availability of such open-access datasets will catalyze the development of advanced AI approaches for clinical trial design.
arXiv Detail & Related papers (2024-06-30T09:13:10Z)
Comparative Benchmarking of Failure Detection Methods in Medical Image Segmentation: Unveiling the Role of Confidence Aggregation [0.723226140060364]
This paper introduces a comprehensive benchmarking framework aimed at evaluating failure detection methodologies within medical image segmentation. We identify the strengths and limitations of current failure detection metrics, advocating for the risk-coverage analysis as a holistic evaluation approach.
arXiv Detail & Related papers (2024-06-05T14:36:33Z)
Unified Uncertainty Estimation for Cognitive Diagnosis Models [70.46998436898205]
We propose a unified uncertainty estimation approach for a wide range of cognitive diagnosis models. We decompose the uncertainty of diagnostic parameters into data aspect and model aspect. Our method is effective and can provide useful insights into the uncertainty of cognitive diagnosis.
arXiv Detail & Related papers (2024-03-09T13:48:20Z)
A Foundational Framework and Methodology for Personalized Early and Timely Diagnosis [84.6348989654916]
We propose the first foundational framework for early and timely diagnosis. It builds on decision-theoretic approaches to outline the diagnosis process. It integrates machine learning and statistical methodology for estimating the optimal personalized diagnostic path.
arXiv Detail & Related papers (2023-11-26T14:42:31Z)
Validating polyp and instrument segmentation methods in colonoscopy through Medico 2020 and MedAI 2021 Challenges [58.32937972322058]
"Medico automatic polyp segmentation (Medico 2020)" and "MedAI: Transparency in Medical Image (MedAI 2021)" competitions. We present a comprehensive summary and analyze each contribution, highlight the strength of the best-performing methods, and discuss the possibility of clinical translations of such methods into the clinic.
arXiv Detail & Related papers (2023-07-30T16:08:45Z)
Understanding metric-related pitfalls in image analysis validation [59.15220116166561]
This work provides the first comprehensive common point of access to information on pitfalls related to validation metrics in image analysis. Focusing on biomedical image analysis but with the potential of transfer to other fields, the addressed pitfalls generalize across application domains and are categorized according to a newly created, domain-agnostic taxonomy.
arXiv Detail & Related papers (2023-02-03T14:57:40Z)
A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems [128.63953314853327]
"Lifelong Learning" systems are capable of 1) Continuous Learning, 2) Transfer and Adaptation, and 3) Scalability. We show that this suite of metrics can inform the development of varied and complex Lifelong Learning systems.
arXiv Detail & Related papers (2023-01-18T21:58:54Z)
An Overview of Healthcare Data Analytics With Applications to the COVID-19 Pandemic [20.912943922420407]
We describe how innovative analytical methods, machine learning tools and metaheuristics can tackle general healthcare problems. In particular, we give applications of modern digital technology, statistical methods, data platforms and data integration systems. We make the case that analyzing and interpreting big data is a very challenging task that requires a multi-disciplinary effort.
arXiv Detail & Related papers (2021-11-25T06:37:24Z)
Human readable network troubleshooting based on anomaly detection and feature scoring [11.593495085674343]
We present a system based on (i) unsupervised learning methods for detecting anomalies in the time domain, (ii) an attention mechanism to rank features in the feature space and (iii) an expert knowledge module. We thoroughly evaluate the performance of the full system and of its individual building blocks.
arXiv Detail & Related papers (2021-08-26T14:20:36Z)
Clinical Outcome Prediction from Admission Notes using Self-Supervised Knowledge Integration [55.88616573143478]
Outcome prediction from clinical text can prevent doctors from overlooking possible risks. Diagnoses at discharge, procedures performed, in-hospital mortality and length-of-stay prediction are four common outcome prediction targets. We propose clinical outcome pre-training to integrate knowledge about patient outcomes from multiple public sources.
arXiv Detail & Related papers (2021-02-08T10:26:44Z)

This list is automatically generated from the titles and abstracts of the papers in this site.