A Survey on Failure Analysis and Fault Injection in AI Systems
- URL: http://arxiv.org/abs/2407.00125v1
- Date: Fri, 28 Jun 2024 00:32:03 GMT
- Title: A Survey on Failure Analysis and Fault Injection in AI Systems
- Authors: Guangba Yu, Gou Tan, Haojia Huang, Zhenyu Zhang, Pengfei Chen, Roberto Natella, Zibin Zheng,
- Abstract summary: The complexity of AI systems has exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ensure resilience and reliability.
This study fills this gap by presenting a detailed survey of existing FA and FI approaches across six layers of AI systems.
Our findings reveal a taxonomy of AI system failures, assess the capabilities of existing FI tools, and highlight discrepancies between real-world and simulated failures.
- Score: 28.30817443151044
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The rapid advancement of Artificial Intelligence (AI) has led to its integration into various areas, especially with Large Language Models (LLMs) significantly enhancing capabilities in Artificial Intelligence Generated Content (AIGC). However, the complexity of AI systems has also exposed their vulnerabilities, necessitating robust methods for failure analysis (FA) and fault injection (FI) to ensure resilience and reliability. Despite the importance of these techniques, there lacks a comprehensive review of FA and FI methodologies in AI systems. This study fills this gap by presenting a detailed survey of existing FA and FI approaches across six layers of AI systems. We systematically analyze 160 papers and repositories to answer three research questions including (1) what are the prevalent failures in AI systems, (2) what types of faults can current FI tools simulate, (3) what gaps exist between the simulated faults and real-world failures. Our findings reveal a taxonomy of AI system failures, assess the capabilities of existing FI tools, and highlight discrepancies between real-world and simulated failures. Moreover, this survey contributes to the field by providing a framework for fault diagnosis, evaluating the state-of-the-art in FI, and identifying areas for improvement in FI techniques to enhance the resilience of AI systems.
Related papers
- From Silos to Systems: Process-Oriented Hazard Analysis for AI Systems [2.226040060318401]
We translate System Theoretic Process Analysis (STPA) for analyzing AI operation and development processes.
We focus on systems that rely on machine learning algorithms and conductedA on three case studies.
We find that key concepts and steps of conducting anA readily apply, albeit with a few adaptations tailored for AI systems.
arXiv Detail & Related papers (2024-10-29T20:43:18Z) - Do great minds think alike? Investigating Human-AI Complementarity in Question Answering with CAIMIRA [43.116608441891096]
Humans outperform AI systems in knowledge-grounded abductive and conceptual reasoning.
State-of-the-art LLMs like GPT-4 and LLaMA show superior performance on targeted information retrieval.
arXiv Detail & Related papers (2024-10-09T03:53:26Z) - EAIRiskBench: Towards Evaluating Physical Risk Awareness for Task Planning of Foundation Model-based Embodied AI Agents [47.69642609574771]
Embodied artificial intelligence (EAI) integrates advanced AI models into physical entities for real-world interaction.
Foundation models as the "brain" of EAI agents for high-level task planning have shown promising results.
However, the deployment of these agents in physical environments presents significant safety challenges.
This study introduces EAIRiskBench, a novel framework for automated physical risk assessment in EAI scenarios.
arXiv Detail & Related papers (2024-08-08T13:19:37Z) - Explainable Artificial Intelligence Techniques for Accurate Fault Detection and Diagnosis: A Review [0.0]
We review the eXplainable AI (XAI) tools and techniques in this context.
We focus on their role in making AI decision-making transparent, particularly in critical scenarios where humans are involved.
We discuss current limitations and potential future research that aims to balance explainability with model performance.
arXiv Detail & Related papers (2024-04-17T17:49:38Z) - Testing autonomous vehicles and AI: perspectives and challenges from cybersecurity, transparency, robustness and fairness [53.91018508439669]
The study explores the complexities of integrating Artificial Intelligence into Autonomous Vehicles (AVs)
It examines the challenges introduced by AI components and the impact on testing procedures.
The paper identifies significant challenges and suggests future directions for research and development of AI in AV technology.
arXiv Detail & Related papers (2024-02-21T08:29:42Z) - Analyzing Adversarial Inputs in Deep Reinforcement Learning [53.3760591018817]
We present a comprehensive analysis of the characterization of adversarial inputs, through the lens of formal verification.
We introduce a novel metric, the Adversarial Rate, to classify models based on their susceptibility to such perturbations.
Our analysis empirically demonstrates how adversarial inputs can affect the safety of a given DRL system with respect to such perturbations.
arXiv Detail & Related papers (2024-02-07T21:58:40Z) - Progressing from Anomaly Detection to Automated Log Labeling and
Pioneering Root Cause Analysis [53.24804865821692]
This study introduces a taxonomy for log anomalies and explores automated data labeling to mitigate labeling challenges.
The study envisions a future where root cause analysis follows anomaly detection, unraveling the underlying triggers of anomalies.
arXiv Detail & Related papers (2023-12-22T15:04:20Z) - Interpretable and Robust AI in EEG Systems: A Survey [13.911001648611832]
We propose a taxonomy of interpretability by characterizing it into three types: backpropagation, perturbation, and inherently interpretable methods.
We classify the robustness mechanisms into four classes: noise and artifacts, human variability, data acquisition instability, and adversarial attacks.
arXiv Detail & Related papers (2023-04-21T05:51:39Z) - On Robust Numerical Solver for ODE via Self-Attention Mechanism [82.95493796476767]
We explore training efficient and robust AI-enhanced numerical solvers with a small data size by mitigating intrinsic noise disturbances.
We first analyze the ability of the self-attention mechanism to regulate noise in supervised learning and then propose a simple-yet-effective numerical solver, Attr, which introduces an additive self-attention mechanism to the numerical solution of differential equations.
arXiv Detail & Related papers (2023-02-05T01:39:21Z) - An Exploratory Study of AI System Risk Assessment from the Lens of Data
Distribution and Uncertainty [4.99372598361924]
Deep learning (DL) has become a driving force and has been widely adopted in many domains and applications.
This paper initiates an early exploratory study of AI system risk assessment from both the data distribution and uncertainty angles.
arXiv Detail & Related papers (2022-12-13T03:34:25Z) - Statistical Perspectives on Reliability of Artificial Intelligence
Systems [6.284088451820049]
We provide statistical perspectives on the reliability of AI systems.
We introduce a so-called SMART statistical framework for AI reliability research.
We discuss recent developments in modeling and analysis of AI reliability.
arXiv Detail & Related papers (2021-11-09T20:00:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.