Related papers: The Decision Path to Control AI Risks Completely: Fundamental Control Mechanisms for AI Governance

The Decision Path to Control AI Risks Completely: Fundamental Control Mechanisms for AI Governance

URL: http://arxiv.org/abs/2512.04489v1
Date: Thu, 04 Dec 2025 05:53:41 GMT
Title: The Decision Path to Control AI Risks Completely: Fundamental Control Mechanisms for AI Governance
Authors: Yong Tao,
Abstract summary: Three of the AIMs must be built inside AI systems and three in society to address major areas of AI risks.<n>We discuss how to strengthen analog physical safeguards to prevent smarter AI/AGI/ASI from circumventing core safety controls.
Score: 1.1252728925416642
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Artificial intelligence (AI) advances rapidly but achieving complete human control over AI risks remains an unsolved problem, akin to driving the fast AI "train" without a "brake system." By exploring fundamental control mechanisms at key elements of AI decisions, this paper develops a systematic solution to thoroughly control AI risks, providing an architecture for AI governance and legislation with five pillars supported by six control mechanisms, illustrated through a minimum set of AI Mandates (AIMs). Three of the AIMs must be built inside AI systems and three in society to address major areas of AI risks: 1) align AI values with human users; 2) constrain AI decision-actions by societal ethics, laws, and regulations; 3) build in human intervention options for emergencies and shut-off switches for existential threats; 4) limit AI access to resources to reinforce controls inside AI; 5) mitigate spillover risks like job loss from AI. We also highlight the differences in AI governance on physical AI systems versus generative AI. We discuss how to strengthen analog physical safeguards to prevent smarter AI/AGI/ASI from circumventing core safety controls by exploiting AI's intrinsic disconnect from the analog physical world: AI's nature as pure software code run on chips controlled by humans, and the prerequisite that all AI-driven physical actions must be digitized. These findings establish a theoretical foundation for AI governance and legislation as the basic structure of a "brake system" for AI decisions. If enacted, these controls can rein in AI dangers as completely as humanly possible, removing large chunks of currently wide-open AI risks, substantially reducing overall AI risks to residual human errors.

Related papers

The Philosophic Turn for AI Agents: Replacing centralized digital rhetoric with decentralized truth-seeking [0.0]
In the face of AI technology, individuals will increasingly rely on AI agents to navigate life's growing complexities.<n>This paper addresses a fundamental dilemma posed by AI decision-support systems: the risk of either becoming overwhelmed by complex decisions, or having autonomy compromised.
arXiv Detail & Related papers (2025-04-24T19:34:43Z)
Superintelligence Strategy: Expert Version [64.7113737051525]
Destabilizing AI developments could raise the odds of great-power conflict.<n>Superintelligence -- AI vastly better than humans at nearly all cognitive tasks -- is now anticipated by AI researchers.<n>We introduce the concept of Mutual Assured AI Malfunction.
arXiv Detail & Related papers (2025-03-07T17:53:24Z)
Alignment, Agency and Autonomy in Frontier AI: A Systems Engineering Perspective [0.0]
Concepts of alignment, agency, and autonomy have become central to AI safety, governance, and control.<n>This paper traces the historical, philosophical, and technical evolution of these concepts, emphasizing how their definitions influence AI development, deployment, and oversight.
arXiv Detail & Related papers (2025-02-20T21:37:20Z)
Aligning Generalisation Between Humans and Machines [74.120848518198]
AI technology can support humans in scientific discovery and forming decisions, but may also disrupt democracies and target individuals.<n>The responsible use of AI and its participation in human-AI teams increasingly shows the need for AI alignment.<n>A crucial yet often overlooked aspect of these interactions is the different ways in which humans and machines generalise.
arXiv Detail & Related papers (2024-11-23T18:36:07Z)
Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization. We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z)
Combining AI Control Systems and Human Decision Support via Robustness and Criticality [53.10194953873209]
We extend a methodology for adversarial explanations (AE) to state-of-the-art reinforcement learning frameworks. We show that the learned AI control system demonstrates robustness against adversarial tampering. In a training / learning framework, this technology can improve both the AI's decisions and explanations through human interaction.
arXiv Detail & Related papers (2024-07-03T15:38:57Z)
Societal Adaptation to Advanced AI [1.2607853680700076]
Existing strategies for managing risks from advanced AI systems often focus on affecting what AI systems are developed and how they diffuse.<n>We urge a complementary approach: increasing societal adaptation to advanced AI.<n>We introduce a conceptual framework which helps identify adaptive interventions that avoid, defend against and remedy potentially harmful uses of AI systems.
arXiv Detail & Related papers (2024-05-16T17:52:12Z)
Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems. There is a lack of consensus about how exactly such risks arise, and how to manage them. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z)
Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety [2.3572498744567127]
We argue that alignment to human intent is insufficient for safe AI systems. We argue that preservation of long-term agency of humans may be a more robust standard.
arXiv Detail & Related papers (2023-05-30T17:14:01Z)
Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations. It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.