Related papers: The Loss of Control Playbook: Degrees, Dynamics, and Preparedness

The Loss of Control Playbook: Degrees, Dynamics, and Preparedness

URL: http://arxiv.org/abs/2511.15846v3
Date: Mon, 24 Nov 2025 18:52:00 GMT
Title: The Loss of Control Playbook: Degrees, Dynamics, and Preparedness
Authors: Charlotte Stix, Annika Hallensleben, Alejandro Ortega, Matteo Pistillo,
Abstract summary: This report addresses the absence of an actionable definition for Loss of Control (LoC) in AI systems by developing a novel taxonomy and preparedness framework.<n>We propose a graded LoC taxonomy, based on the metrics of severity and persistence, that distinguishes between Deviation, Bounded LoC, and Strict LoC.<n>We put forward a plan to maintain preparedness and prevent the occurrence of LoC outcomes should a state of societal vulnerability be reached.
Score: 39.39076397908963
License: http://creativecommons.org/licenses/by/4.0/
Abstract: This research report addresses the absence of an actionable definition for Loss of Control (LoC) in AI systems by developing a novel taxonomy and preparedness framework. Despite increasing policy and research attention, existing LoC definitions vary significantly in scope and timeline, hindering effective LoC assessment and mitigation. To address this issue, we draw from an extensive literature review and propose a graded LoC taxonomy, based on the metrics of severity and persistence, that distinguishes between Deviation, Bounded LoC, and Strict LoC. We model pathways toward a societal state of vulnerability in which sufficiently advanced AI systems have acquired or could acquire the means to cause Bounded or Strict LoC once a catalyst, either misalignment or pure malfunction, materializes. We argue that this state becomes increasingly likely over time, absent strategic intervention, and propose a strategy to avoid reaching a state of vulnerability. Rather than focusing solely on intervening on AI capabilities and propensities potentially relevant for LoC or on preventing potential catalysts, we introduce a complementary framework that emphasizes three extrinsic factors: Deployment context, Affordances, and Permissions (the DAP framework). Compared to work on intrinsic factors and catalysts, this framework has the unfair advantage of being actionable today. Finally, we put forward a plan to maintain preparedness and prevent the occurrence of LoC outcomes should a state of societal vulnerability be reached, focusing on governance measures (threat modeling, deployment policies, emergency response) and technical controls (pre-deployment testing, control measures, monitoring) that could maintain a condition of perennial suspension.

Related papers

The Controllability Trap: A Governance Framework for Military AI Agents [0.0]
We propose the Agentic Military AI Governance Framework (AMAGF)<n>AMAGF is a measurable architecture structured around three pillars: Preventive Governance, Detective Governance, and Corrective Governance.<n>Its core mechanism, the Control Quality Score (CQS), is a composite real-time metric quantifying human control and enabling graduated responses as control weakens.
arXiv Detail & Related papers (2026-03-03T20:48:01Z)
Legitimate Overrides in Decentralized Protocols [7.049550859772001]
Decentralized protocols claim immutable, rule-based execution, yet many embed emergency mechanisms such as chain-level freezes, protocol pauses, and account quarantines.<n>These overrides are crucial for responding to exploits and systemic failures, but they expose a core tension: when does intervention preserve trust and when is it perceived as illegitimate discretion?<n>With approximately $10$ billion in technical exploit losses potentially addressable by onchain intervention, the design of these mechanisms has high practical stakes.
arXiv Detail & Related papers (2026-02-12T18:51:30Z)
Gaming the Judge: Unfaithful Chain-of-Thought Can Undermine Agent Evaluation [76.5533899503582]
Large language models (LLMs) are increasingly used as judges to evaluate agent performance.<n>We show this paradigm implicitly assumes that the agent's chain-of-thought (CoT) reasoning faithfully reflects both its internal reasoning and the underlying environment state.<n>We demonstrate that manipulated reasoning alone can inflate false positive rates of state-of-the-art VLM judges by up to 90% across 800 trajectories spanning diverse web tasks.
arXiv Detail & Related papers (2026-01-21T06:07:43Z)
Making LLMs Reliable When It Matters Most: A Five-Layer Architecture for High-Stakes Decisions [51.56484100374058]
Current large language models (LLMs) excel in verifiable domains where outputs can be checked before action but prove less reliable for high-stakes strategic decisions with uncertain outcomes.<n>This gap, driven by mutually cognitive biases in both humans and artificial intelligence (AI) systems, threatens the defensibility of valuations and sustainability of investments in the sector.<n>This report describes a framework emerging from systematic qualitative assessment across 7 frontier-grade LLMs and 3 market-facing venture vignettes under time pressure.
arXiv Detail & Related papers (2025-11-10T22:24:21Z)
Pinpointing crucial steps: Attribution-based Credit Assignment for Verifiable Reinforcement Learning [5.880405013005892]
ACPO is a phased framework that incorporates a difficulty-aware curriculum.<n>ACPO improves exploration by using trajectory semantic segmentation and an attribution-based representation.<n>It enhances exploitation with a factorized reward system that precisely quantifies the hierarchical contribution of each reasoning step.
arXiv Detail & Related papers (2025-10-10T01:22:55Z)
A Dynamical Systems Framework for Reinforcement Learning Safety and Robustness Verification [1.104960878651584]
This paper introduces a novel framework that addresses the lack of formal methods for verifying the robustness and safety of learned policies.<n>By leveraging tools from dynamical systems theory, we identify and visualize Lagrangian Coherent Structures (LCS) that act as the hidden "skeleton" governing the system's behavior.<n>We show that this framework provides a comprehensive and interpretable assessment of policy behavior, successfully identifying critical flaws in policies that appear successful based on reward alone.
arXiv Detail & Related papers (2025-08-21T14:00:26Z)
Limits of Safe AI Deployment: Differentiating Oversight and Control [0.0]
"Human oversight" risk codifying vague or inconsistent interpretations of key concepts like oversight and control.<n>This paper undertakes a targeted critical review of literature on supervision outside of AI.<n>Control aims to prevent failures, while oversight focuses on detection, remediation, or incentives for future prevention.
arXiv Detail & Related papers (2025-07-04T12:22:35Z)
Toward a Global Regime for Compute Governance: Building the Pause Button [0.4952055253916912]
We propose a governance system designed to prevent AI systems from being trained by restricting access to computational resources.<n>We identify three key intervention points -- technical, traceability, and regulatory -- and organize them within a Governance--Enforcement--Verification framework.<n> Technical mechanisms include tamper-proof FLOP caps, model locking, and offline licensing.
arXiv Detail & Related papers (2025-06-25T15:18:19Z)
Safety Margins for Reinforcement Learning [53.10194953873209]
We show how to leverage proxy criticality metrics to generate safety margins. We evaluate our approach on learned policies from APE-X and A3C within an Atari environment.
arXiv Detail & Related papers (2023-07-25T16:49:54Z)
Balancing detectability and performance of attacks on the control channel of Markov Decision Processes [77.66954176188426]
We investigate the problem of designing optimal stealthy poisoning attacks on the control channel of Markov decision processes (MDPs) This research is motivated by the recent interest of the research community for adversarial and poisoning attacks applied to MDPs, and reinforcement learning (RL) methods.
arXiv Detail & Related papers (2021-09-15T09:13:10Z)
Towards Understanding the Adversarial Vulnerability of Skeleton-based Action Recognition [133.35968094967626]
Skeleton-based action recognition has attracted increasing attention due to its strong adaptability to dynamic circumstances. With the help of deep learning techniques, it has also witnessed substantial progress and currently achieved around 90% accuracy in benign environment. Research on the vulnerability of skeleton-based action recognition under different adversarial settings remains scant.
arXiv Detail & Related papers (2020-05-14T17:12:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.