Related papers: Quantifying Automation Risk in High-Automation AI Systems: A Bayesian Framework for Failure Propagation and Optimal Oversight

Quantifying Automation Risk in High-Automation AI Systems: A Bayesian Framework for Failure Propagation and Optimal Oversight

URL: http://arxiv.org/abs/2602.18986v1
Date: Sun, 22 Feb 2026 00:18:23 GMT
Title: Quantifying Automation Risk in High-Automation AI Systems: A Bayesian Framework for Failure Propagation and Optimal Oversight
Authors: Vishal Srivastava, Tanmay Sah,
Abstract summary: We propose a parsimonious Bayesian risk decomposition expressing expected loss as the product of three terms.<n>This framework captures execution and oversight risk rather than model accuracy alone.<n>We motivate the framework with an illustrative case study of the 2012 Knight Capital incident as one instantiation of a broadly applicable failure pattern.
Score: 1.6328866317851185
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Organizations across finance, healthcare, transportation, content moderation, and critical infrastructure are rapidly deploying highly automated AI systems, yet they lack principled methods to quantify how increasing automation amplifies harm when failures occur. We propose a parsimonious Bayesian risk decomposition expressing expected loss as the product of three terms: the probability of system failure, the conditional probability that a failure propagates into harm given the automation level, and the expected severity of harm. This framework isolates a critical quantity -- the conditional probability that failures propagate into harm -- which captures execution and oversight risk rather than model accuracy alone. We develop complete theoretical foundations: formal proofs of the decomposition, a harm propagation equivalence theorem linking the harm propagation probability to observable execution controls, risk elasticity measures, efficient frontier analysis for automation policy, and optimal resource allocation principles with second-order conditions. We motivate the framework with an illustrative case study of the 2012 Knight Capital incident ($440M loss) as one instantiation of a broadly applicable failure pattern, and characterize the research design required to empirically validate the framework at scale across deployment domains. This work provides the theoretical foundations for a new class of deployment-focused risk governance tools for agentic and automated AI systems.

Related papers

SAGE-LLM: Towards Safe and Generalizable LLM Controller with Fuzzy-CBF Verification and Graph-Structured Knowledge Retrieval for UAV Decision [46.089736018739295]
Large Language Models (LLM) lack domain-specific UAV control knowledge and formal safety assurances.<n>This paper proposes a train-free two-layer decision architecture based on LLMs, integrating high-level safety planning with low-level precise control.
arXiv Detail & Related papers (2026-02-27T06:41:04Z)
Conformal Thinking: Risk Control for Reasoning on a Compute Budget [60.65072883773352]
Reasoning Large Language Models (LLMs) enable test-time scaling, with dataset-level accuracy improving as the token budget increases.<n>We re-frame the budget setting problem as risk control, limiting the error rate while minimizing compute.<n>Our framework introduces an upper threshold that stops reasoning when the model is confident and a novel lower threshold that preemptively stops unsolvable instances.
arXiv Detail & Related papers (2026-02-03T18:17:22Z)
The Shadow Self: Intrinsic Value Misalignment in Large Language Model Agents [37.75212140218036]
We formalize the Loss-of-Control risk and identify the previously underexamined Intrinsic Value Misalignment (Intrinsic VM)<n>We then introduce IMPRESS, a scenario-driven framework for systematically assessing this risk.<n>We evaluate Intrinsic VM on 21 state-of-the-art LLM agents and find that it is a common and broadly observed safety risk across models.
arXiv Detail & Related papers (2026-01-24T07:09:50Z)
Adaptive Accountability in Networked MAS: Tracing and Mitigating Emergent Norms at Scale [2.28438857884398]
Large-scale networked multi-agent systems increasingly underpin critical infrastructure.<n>We introduce an adaptive accountability framework that traces responsibility flows through a lifecycle-aware audit ledger.<n>We prove a bounded-compromise theorem showing that whenever the expected intervention cost exceeds an adversary's payoff, the long-run proportion of compromised interactions is bounded by a constant strictly less than one.
arXiv Detail & Related papers (2025-12-21T02:04:47Z)
Toward Quantitative Modeling of Cybersecurity Risks Due to AI Misuse [50.87630846876635]
We develop nine detailed cyber risk models.<n>Each model decomposes attacks into steps using the MITRE ATT&CK framework.<n>Individual estimates are aggregated through Monte Carlo simulation.
arXiv Detail & Related papers (2025-12-09T17:54:17Z)
The Role of Risk Modeling in Advanced AI Risk Management [33.357295564462284]
Rapidly advancing artificial intelligence (AI) systems introduce novel, uncertain, and potentially catastrophic risks.<n>Managing these risks requires a mature risk-management infrastructure whose cornerstone is rigorous risk modeling.<n>We argue that advanced-AI governance should adopt a similar dual approach and that verifiable, provably-safe AI architectures are urgently needed.
arXiv Detail & Related papers (2025-12-09T15:37:33Z)
Generalized Inequality-based Approach for Probabilistic WCET Estimation [0.0]
This paper proposes a method to reduce such pessimism by incorporating saturating functions into Chebyshev's inequality.<n> Evaluations on synthetic and real-world data from the Autoware autonomous driving stack demonstrate that the proposed method achieves safe and tighter bounds for such distributions.
arXiv Detail & Related papers (2025-11-12T06:19:31Z)
Making LLMs Reliable When It Matters Most: A Five-Layer Architecture for High-Stakes Decisions [51.56484100374058]
Current large language models (LLMs) excel in verifiable domains where outputs can be checked before action but prove less reliable for high-stakes strategic decisions with uncertain outcomes.<n>This gap, driven by mutually cognitive biases in both humans and artificial intelligence (AI) systems, threatens the defensibility of valuations and sustainability of investments in the sector.<n>This report describes a framework emerging from systematic qualitative assessment across 7 frontier-grade LLMs and 3 market-facing venture vignettes under time pressure.
arXiv Detail & Related papers (2025-11-10T22:24:21Z)
Building a Foundational Guardrail for General Agentic Systems via Synthetic Data [76.18834864749606]
LLM agents can plan multi-step tasks, intervening at the planning stage-before any action is executed-is often the safest way to prevent harm.<n>Existing guardrails mostly operate post-execution, which is difficult to scale and leaves little room for controllable supervision at the plan level.<n>We introduce AuraGen, a controllable engine that synthesizes benign trajectories, injects category-labeled risks with difficulty, and filters outputs via an automated reward model.
arXiv Detail & Related papers (2025-10-10T18:42:32Z)
CORTEX: Composite Overlay for Risk Tiering and Exposure in Operational AI Systems [0.812761334568906]
This paper introduces CORTEX, a multi-layered risk scoring framework to assess and score AI system vulnerabilities.<n>It was developed on empirical analysis of over 1,200 incidents documented in the AI Incident Database (AIID)<n>The resulting composite score can be operationalized across AI risk registers, model audits, conformity checks, and dynamic governance dashboards.
arXiv Detail & Related papers (2025-08-24T07:30:25Z)
Cannot See the Forest for the Trees: Invoking Heuristics and Biases to Elicit Irrational Choices of LLMs [83.11815479874447]
We propose a novel jailbreak attack framework, inspired by cognitive decomposition and biases in human cognition.<n>We employ cognitive decomposition to reduce the complexity of malicious prompts and relevance bias to reorganize prompts.<n>We also introduce a ranking-based harmfulness evaluation metric that surpasses the traditional binary success-or-failure paradigm.
arXiv Detail & Related papers (2025-05-03T05:28:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.