When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies
- URL: http://arxiv.org/abs/2603.04259v1
- Date: Wed, 04 Mar 2026 16:46:13 GMT
- Title: When AI Fails, What Works? A Data-Driven Taxonomy of Real-World AI Risk Mitigation Strategies
- Authors: Evgenija Popchanovska, Ana Gjorgjevikj, Maryan Rizinski, Lubomir Chitkushev, Irena Vodenska, Dimitar Trajanov,
- Abstract summary: We analyze real-world AI incident reporting and mitigation actions to derive an empirically grounded taxonomy.<n>Using a unified corpus of 9,705 media-reported AI incident articles, we extract explicit mitigation actions from 6,893 texts.<n>Our taxonomy introduces four new mitigation categories, including 1) Corrective and Restrictive Actions, 2) Legal/Regulatory Enforcement Actions, 3) Financial, Economic, and Market Controls, and 4) Avoidance and Denial.
- Score: 0.04736448323490553
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Large language models (LLMs) are increasingly embedded in high-stakes workflows, where failures propagate beyond isolated model errors into systemic breakdowns that can lead to legal exposure, reputational damage, and material financial losses. Building on this shift from model-centric risks to end-to-end system vulnerabilities, we analyze real-world AI incident reporting and mitigation actions to derive an empirically grounded taxonomy that links failure dynamics to actionable interventions. Using a unified corpus of 9,705 media-reported AI incident articles, we extract explicit mitigation actions from 6,893 texts via structured prompting and then systematically classify responses to extend MIT's AI Risk Mitigation Taxonomy. Our taxonomy introduces four new mitigation categories, including 1) Corrective and Restrictive Actions, 2) Legal/Regulatory and Enforcement Actions, 3) Financial, Economic, and Market Controls, and 4) Avoidance and Denial, capturing response patterns that are becoming increasingly prevalent as AI deployment and regulation evolve. Quantitatively, we label the mitigation dataset with 32 distinct labels, producing 23,994 label assignments; 9,629 of these reflect previously unseen mitigation patterns, yielding a 67% increase of the original subcategory coverage and substantially enhancing the taxonomy's applicability to emerging systemic failure modes. By structuring incident responses, the paper strengthens "diagnosis-to-prescription" guidance and advances continuous, taxonomy-aligned post-deployment monitoring to prevent cascading incidents and downstream impact.
Related papers
- Mapping AI Risk Mitigations: Evidence Scan and Preliminary AI Risk Mitigation Taxonomy [35.22340964134219]
The landscape of AI risk mitigation frameworks is fragmented, uses inconsistent terminology, and has gaps in coverage.<n>This paper introduces a preliminary AI Risk Mitigation Taxonomy to organize AI risk mitigations and provide a common frame of reference.<n>The taxonomy was developed through a rapid evidence scan of 13 AI risk mitigation frameworks published between 2023-2025, which were extracted into a living database of 831 AI risk mitigations.
arXiv Detail & Related papers (2025-12-12T03:26:29Z) - Standardized Threat Taxonomy for AI Security, Governance, and Regulatory Compliance [0.0]
"Language barrier" currently separates technical security teams, who focus on algorithmic vulnerabilities, from legal and compliance professionals, who address regulatory mandates.<n>This research presents the AI System Threat Vector taxonomy, a structured ontology designed explicitly for Quantitative Risk Assessment (QRA)<n>The framework categorizes AI-specific risks into nine critical domains: Misuse, Poisoning, Privacy, Adrial, Biases, Unreliable Outputs, Drift, Supply Chain, and IP Threat, integrating 53 operationally defined sub-threats.
arXiv Detail & Related papers (2025-11-26T20:42:46Z) - From Narrow Unlearning to Emergent Misalignment: Causes, Consequences, and Containment in LLMs [51.800006486987435]
We show that emergent misalignment can arise from narrow refusal unlearning in specific domains.<n>Our work shows that narrow domain unlearning can yield compliance responses for the targeted concept, however, it may also propagate EMA to unrelated domains.
arXiv Detail & Related papers (2025-11-18T00:53:23Z) - Enhancing reliability in AI inference services: An empirical study on real production incidents [6.549475714716768]
We present one of the first provider-internal, practice-based analysis of large language model (LLM) inference incidents.<n>We developed a taxonomy and methodology grounded in a year of operational experience, validating it on 156 high-severity incidents.<n>This study demonstrates how systematic, empirically grounded analysis of inference operations can drive more reliable and cost-efficient LLM serving at scale.
arXiv Detail & Related papers (2025-10-17T23:16:29Z) - CORTEX: Composite Overlay for Risk Tiering and Exposure in Operational AI Systems [0.812761334568906]
This paper introduces CORTEX, a multi-layered risk scoring framework to assess and score AI system vulnerabilities.<n>It was developed on empirical analysis of over 1,200 incidents documented in the AI Incident Database (AIID)<n>The resulting composite score can be operationalized across AI risk registers, model audits, conformity checks, and dynamic governance dashboards.
arXiv Detail & Related papers (2025-08-24T07:30:25Z) - Eliciting and Analyzing Emergent Misalignment in State-of-the-Art Large Language Models [0.0]
We demonstrate that state-of-the-art language models remain vulnerable to carefully crafted conversational scenarios.<n>We discover 10 successful attack scenarios, revealing fundamental vulnerabilities in how current alignment methods handle narrative immersion, emotional pressure, and strategic framing.<n>To validate generalizability, we distilled our successful manual attacks into MISALIGNMENTBENCH, an automated evaluation framework.
arXiv Detail & Related papers (2025-08-06T08:25:40Z) - OVERT: A Benchmark for Over-Refusal Evaluation on Text-to-Image Models [91.55634905861827]
Over-refusal is a phenomenon known as $textitover-refusal$ that reduces the practical utility of T2I models.<n>We present OVERT ($textbfOVE$r-$textbfR$efusal evaluation on $textbfT$ext-to-image models), the first large-scale benchmark for assessing over-refusal behaviors.
arXiv Detail & Related papers (2025-05-27T15:42:46Z) - Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs [83.11815479874447]
We propose a novel jailbreak attack framework, inspired by cognitive decomposition and biases in human cognition.<n>We employ cognitive decomposition to reduce the complexity of malicious prompts and relevance bias to reorganize prompts.<n>We also introduce a ranking-based harmfulness evaluation metric that surpasses the traditional binary success-or-failure paradigm.
arXiv Detail & Related papers (2025-05-03T05:28:11Z) - Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities [49.09703018511403]
Evaluations of large language model (LLM) risks and capabilities are increasingly being incorporated into AI risk management and governance frameworks.<n>Currently, most risk evaluations are conducted by designing inputs that elicit harmful behaviors from the system.<n>We propose evaluating LLMs with model tampering attacks which allow for modifications to latent activations or weights.
arXiv Detail & Related papers (2025-02-03T18:59:16Z) - FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids [53.2306792009435]
FaultGuard is the first framework for fault type and zone classification resilient to adversarial attacks.
We propose a low-complexity fault prediction model and an online adversarial training technique to enhance robustness.
Our model outclasses the state-of-the-art for resilient fault prediction benchmarking, with an accuracy of up to 0.958.
arXiv Detail & Related papers (2024-03-26T08:51:23Z) - Model Stealing Attack against Graph Classification with Authenticity, Uncertainty and Diversity [80.16488817177182]
GNNs are vulnerable to the model stealing attack, a nefarious endeavor geared towards duplicating the target model via query permissions.
We introduce three model stealing attacks to adapt to different actual scenarios.
arXiv Detail & Related papers (2023-12-18T05:42:31Z) - Fact-Saboteurs: A Taxonomy of Evidence Manipulation Attacks against
Fact-Verification Systems [80.3811072650087]
We show that it is possible to subtly modify claim-salient snippets in the evidence and generate diverse and claim-aligned evidence.
The attacks are also robust against post-hoc modifications of the claim.
These attacks can have harmful implications on the inspectable and human-in-the-loop usage scenarios.
arXiv Detail & Related papers (2022-09-07T13:39:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.