Limits of Safe AI Deployment: Differentiating Oversight and Control
- URL: http://arxiv.org/abs/2507.03525v2
- Date: Mon, 03 Nov 2025 07:38:49 GMT
- Title: Limits of Safe AI Deployment: Differentiating Oversight and Control
- Authors: David Manheim, Aidan Homewood,
- Abstract summary: "Human oversight" risk codifying vague or inconsistent interpretations of key concepts like oversight and control.<n>This paper undertakes a targeted critical review of literature on supervision outside of AI.<n>Control aims to prevent failures, while oversight focuses on detection, remediation, or incentives for future prevention.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Oversight and control, which we collectively call supervision, are often discussed as ways to ensure that AI systems are accountable, reliable, and able to fulfill governance and management requirements. However, the requirements for "human oversight" risk codifying vague or inconsistent interpretations of key concepts like oversight and control. This ambiguous terminology could undermine efforts to design or evaluate systems that must operate under meaningful human supervision. This matters because the term is used by regulatory texts such as the EU AI Act. This paper undertakes a targeted critical review of literature on supervision outside of AI, along with a brief summary of past work on the topic related to AI. We next differentiate control as ex-ante or real-time and operational rather than policy or governance, and oversight as performed ex-post, or a policy and governance function. Control aims to prevent failures, while oversight focuses on detection, remediation, or incentives for future prevention. Building on this, we make three contributions. 1) We propose a framework to align regulatory expectations with what is technically and organizationally plausible, articulating the conditions under which each mechanism is possible, where they fall short, and what is required to make them meaningful in practice. 2) We outline how supervision methods should be documented and integrated into risk management, and drawing on the Microsoft Responsible AI Maturity Model, we outline a maturity model for AI supervision. 3) We explicitly highlight boundaries of these mechanisms, including where they apply, where they fail, and where it is clear that no existing methods suffice. This foregrounds the question of whether meaningful supervision is possible in a given deployment context, and can support regulators, auditors, and practitioners in identifying both present and future limitations.
Related papers
- The Controllability Trap: A Governance Framework for Military AI Agents [0.0]
We propose the Agentic Military AI Governance Framework (AMAGF)<n>AMAGF is a measurable architecture structured around three pillars: Preventive Governance, Detective Governance, and Corrective Governance.<n>Its core mechanism, the Control Quality Score (CQS), is a composite real-time metric quantifying human control and enabling graduated responses as control weakens.
arXiv Detail & Related papers (2026-03-03T20:48:01Z) - Administrative Law's Fourth Settlement: AI and the Capability-Accountability Trap [0.0]
Since 1887, administrative law has navigated a "capability-accountability trap"<n>This Article proposes three doctrinal innovations within administrative law to realize this potential.
arXiv Detail & Related papers (2026-02-10T11:36:01Z) - AI Deception: Risks, Dynamics, and Controls [153.71048309527225]
This project provides a comprehensive and up-to-date overview of the AI deception field.<n>We identify a formal definition of AI deception, grounded in signaling theory from studies of animal deception.<n>We organize the landscape of AI deception research as a deception cycle, consisting of two key components: deception emergence and deception treatment.
arXiv Detail & Related papers (2025-11-27T16:56:04Z) - Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance [211.5823259429128]
We propose a comprehensive framework integrating technical and societal dimensions, structured around three interconnected pillars: Intrinsic Security, Derivative Security, and Social Ethics.<n>We identify three core challenges: (1) the generalization gap, where defenses fail against evolving threats; (2) inadequate evaluation protocols that overlook real-world risks; and (3) fragmented regulations leading to inconsistent oversight.<n>Our framework offers actionable guidance for researchers, engineers, and policymakers to develop AI systems that are not only robust and secure but also ethically aligned and publicly trustworthy.
arXiv Detail & Related papers (2025-08-12T09:42:56Z) - Out of Control -- Why Alignment Needs Formal Control Theory (and an Alignment Control Stack) [0.6526824510982799]
This position paper argues that formal optimal control theory should be central to AI alignment research.<n>It offers a distinct perspective from prevailing AI safety and security approaches.
arXiv Detail & Related papers (2025-06-21T22:45:19Z) - Rigor in AI: Doing Rigorous AI Work Requires a Broader, Responsible AI-Informed Conception of Rigor [83.99510317617694]
We argue that a broader conception of what rigorous AI research and practice should entail is needed.<n>We aim to provide useful language and a framework for much-needed dialogue about the AI community's work.
arXiv Detail & Related papers (2025-06-17T15:44:41Z) - Explainable AI Systems Must Be Contestable: Here's How to Make It Happen [2.5875936082584623]
This paper presents the first rigorous formal definition of contestability in explainable AI.<n>We introduce a modular framework of by-design and post-hoc mechanisms spanning human-centered interfaces, technical processes, and organizational architectures.<n>Our work equips practitioners with the tools to embed genuine recourse and accountability into AI systems.
arXiv Detail & Related papers (2025-06-02T13:32:05Z) - Watermarking Without Standards Is Not AI Governance [46.71493672772134]
We argue that current implementations risk serving as symbolic compliance rather than delivering effective oversight.<n>We propose a three-layer framework encompassing technical standards, audit infrastructure, and enforcement mechanisms.
arXiv Detail & Related papers (2025-05-27T18:10:04Z) - Media and responsible AI governance: a game-theoretic and LLM analysis [61.132523071109354]
This paper investigates the interplay between AI developers, regulators, users, and the media in fostering trustworthy AI systems.<n>Using evolutionary game theory and large language models (LLMs), we model the strategic interactions among these actors under different regulatory regimes.
arXiv Detail & Related papers (2025-03-12T21:39:38Z) - Decoding the Black Box: Integrating Moral Imagination with Technical AI Governance [0.0]
We develop a comprehensive framework designed to regulate AI technologies deployed in high-stakes domains such as defense, finance, healthcare, and education.<n>Our approach combines rigorous technical analysis, quantitative risk assessment, and normative evaluation to expose systemic vulnerabilities.
arXiv Detail & Related papers (2025-03-09T03:11:32Z) - Position: Mind the Gap-the Growing Disconnect Between Established Vulnerability Disclosure and AI Security [56.219994752894294]
We argue that adapting existing processes for AI security reporting is doomed to fail due to fundamental shortcomings for the distinctive characteristics of AI systems.<n>Based on our proposal to address these shortcomings, we discuss an approach to AI security reporting and how the new AI paradigm, AI agents, will further reinforce the need for specialized AI security incident reporting advancements.
arXiv Detail & Related papers (2024-12-19T13:50:26Z) - Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act)
Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence.
As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z) - The Artificial Intelligence Act: critical overview [0.0]
This article provides a critical overview of the recently approved Artificial Intelligence Act.
It starts by presenting the main structure, objectives, and approach of Regulation (EU) 2024/1689.
The text concludes that even if the overall framework can be deemed adequate and balanced, the approach is so complex that it risks defeating its own purpose.
arXiv Detail & Related papers (2024-08-30T21:38:02Z) - Open Problems in Technical AI Governance [102.19067750759471]
Technical AI governance refers to technical analysis and tools for supporting the effective governance of AI.<n>This paper is intended as a resource for technical researchers or research funders looking to contribute to AI governance.
arXiv Detail & Related papers (2024-07-20T21:13:56Z) - Generative AI Needs Adaptive Governance [0.0]
generative AI challenges the notions of governance, trust, and human agency.
This paper argues that generative AI calls for adaptive governance.
We outline actors, roles, as well as both shared and actors-specific policy activities.
arXiv Detail & Related papers (2024-06-06T23:47:14Z) - Value Functions are Control Barrier Functions: Verification of Safe
Policies using Control Theory [46.85103495283037]
We propose a new approach to apply verification methods from control theory to learned value functions.
We formalize original theorems that establish links between value functions and control barrier functions.
Our work marks a significant step towards a formal framework for the general, scalable, and verifiable design of RL-based control systems.
arXiv Detail & Related papers (2023-06-06T21:41:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.