Related papers: Towards AI-$45^{\circ}$ Law: A Roadmap to Trustworthy AGI

Towards AI-$45^{\circ}$ Law: A Roadmap to Trustworthy AGI

URL: http://arxiv.org/abs/2412.14186v2
Date: Sun, 22 Dec 2024 08:52:15 GMT
Title: Towards AI-$45^{\circ}$ Law: A Roadmap to Trustworthy AGI
Authors: Chao Yang, Chaochao Lu, Yingchun Wang, Bowen Zhou,
Abstract summary: We propose the textitAI-textbf$45circ$ Law as a guiding principle for a balanced roadmap toward trustworthy AGI.<n>This framework provides a systematic taxonomy and hierarchical structure for current AI capability and safety research, inspired by Judea Pearl's Ladder of Causation''
Score: 24.414787444128947
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Ensuring Artificial General Intelligence (AGI) reliably avoids harmful behaviors is a critical challenge, especially for systems with high autonomy or in safety-critical domains. Despite various safety assurance proposals and extreme risk warnings, comprehensive guidelines balancing AI safety and capability remain lacking. In this position paper, we propose the \textit{AI-\textbf{$45^{\circ}$} Law} as a guiding principle for a balanced roadmap toward trustworthy AGI, and introduce the \textit{Causal Ladder of Trustworthy AGI} as a practical framework. This framework provides a systematic taxonomy and hierarchical structure for current AI capability and safety research, inspired by Judea Pearl's ``Ladder of Causation''. The Causal Ladder comprises three core layers: the Approximate Alignment Layer, the Intervenable Layer, and the Reflectable Layer. These layers address the key challenges of safety and trustworthiness in AGI and contemporary AI systems. Building upon this framework, we define five levels of trustworthy AGI: perception, reasoning, decision-making, autonomy, and collaboration trustworthiness. These levels represent distinct yet progressive aspects of trustworthy AGI. Finally, we present a series of potential governance measures to support the development of trustworthy AGI.

Related papers

Incentive-Aware AI Safety via Strategic Resource Allocation: A Stackelberg Security Games Perspective [31.55000083809067]
We show how game-theoretic deterrence can make AI oversight proactive, risk-aware, and resilient to manipulation.<n>We illustrate how this framework can inform (1) training-time auditing against data/feedback poisoning, (2) pre-deployment evaluation under constrained reviewer resources, and (3) robust multi-model deployment in adversarial environments.
arXiv Detail & Related papers (2026-02-06T23:20:26Z)
Frontier AI Auditing: Toward Rigorous Third-Party Assessment of Safety and Security Practices at Leading AI Companies [57.521647436515785]
We define frontier AI auditing as rigorous third-party verification of frontier AI developers' safety and security claims.<n>We introduce AI Assurance Levels (AAL-1 to AAL-4), ranging from time-bounded system audits to continuous, deception-resilient verification.
arXiv Detail & Related papers (2026-01-16T18:44:09Z)
ANNIE: Be Careful of Your Robots [48.89876809734855]
We present the first systematic study of adversarial safety attacks on embodied AI systems.<n>We show attack success rates exceeding 50% across all safety categories.<n>Results expose a previously underexplored but highly consequential attack surface in embodied AI systems.
arXiv Detail & Related papers (2025-09-03T15:00:28Z)
Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance [211.5823259429128]
We propose a comprehensive framework integrating technical and societal dimensions, structured around three interconnected pillars: Intrinsic Security, Derivative Security, and Social Ethics.<n>We identify three core challenges: (1) the generalization gap, where defenses fail against evolving threats; (2) inadequate evaluation protocols that overlook real-world risks; and (3) fragmented regulations leading to inconsistent oversight.<n>Our framework offers actionable guidance for researchers, engineers, and policymakers to develop AI systems that are not only robust and secure but also ethically aligned and publicly trustworthy.
arXiv Detail & Related papers (2025-08-12T09:42:56Z)
Generative AI-Empowered Secure Communications in Space-Air-Ground Integrated Networks: A Survey and Tutorial [107.26005706569498]
Space-air-ground integrated networks (SAGINs) face unprecedented security challenges due to their inherent characteristics.<n>Generative AI (GAI) is a transformative approach that can safeguard SAGIN security by synthesizing data, understanding semantics, and making autonomous decisions.
arXiv Detail & Related papers (2025-08-04T01:42:57Z)
Security-First AI: Foundations for Robust and Trustworthy Systems [0.0]
This manuscript posits that AI security must be prioritized as a foundational layer. We argue for a security-first approach to enable trustworthy and resilient AI systems.
arXiv Detail & Related papers (2025-04-17T22:53:01Z)
Do LLMs trust AI regulation? Emerging behaviour of game-theoretic LLM agents [61.132523071109354]
This paper investigates the interplay between AI developers, regulators and users, modelling their strategic choices under different regulatory scenarios. Our research identifies emerging behaviours of strategic AI agents, which tend to adopt more "pessimistic" stances than pure game-theoretic agents.
arXiv Detail & Related papers (2025-04-11T15:41:21Z)
A Framework for the Assurance of AI-Enabled Systems [0.0]
This paper proposes a claims-based framework for risk management and assurance of AI systems. The paper's contributions are a framework process for AI assurance, a set of relevant definitions, and a discussion of important considerations in AI assurance.
arXiv Detail & Related papers (2025-04-03T13:44:01Z)
Responsible Artificial Intelligence Systems: A Roadmap to Society's Trust through Trustworthy AI, Auditability, Accountability, and Governance [37.10526074040908]
This paper explores the concept of a responsible AI system from a holistic perspective. The final goal of the paper is to propose a roadmap in the design of responsible AI systems.
arXiv Detail & Related papers (2025-02-04T14:47:30Z)
Meta-Sealing: A Revolutionizing Integrity Assurance Protocol for Transparent, Tamper-Proof, and Trustworthy AI System [0.0]
This research introduces Meta-Sealing, a cryptographic framework that fundamentally changes integrity verification in AI systems. The framework combines advanced cryptography with distributed verification, delivering tamper-evident guarantees that achieve both mathematical rigor and computational efficiency.
arXiv Detail & Related papers (2024-10-31T15:31:22Z)
Engineering Trustworthy AI: A Developer Guide for Empirical Risk Minimization [53.80919781981027]
Key requirements for trustworthy AI can be translated into design choices for the components of empirical risk minimization. We hope to provide actionable guidance for building AI systems that meet emerging standards for trustworthiness of AI.
arXiv Detail & Related papers (2024-10-25T07:53:32Z)
Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act) Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence. As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z)
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z)
Quantifying AI Vulnerabilities: A Synthesis of Complexity, Dynamical Systems, and Game Theory [0.0]
We propose a novel approach that introduces three metrics: System Complexity Index (SCI), Lyapunov Exponent for AI Stability (LEAIS), and Nash Equilibrium Robustness (NER) SCI quantifies the inherent complexity of an AI system, LEAIS captures its stability and sensitivity to perturbations, and NER evaluates its strategic robustness against adversarial manipulation.
arXiv Detail & Related papers (2024-04-07T07:05:59Z)
Levels of AGI for Operationalizing Progress on the Path to AGI [64.59151650272477]
We propose a framework for classifying the capabilities and behavior of Artificial General Intelligence (AGI) models and their precursors. This framework introduces levels of AGI performance, generality, and autonomy, providing a common language to compare models, assess risks, and measure progress along the path to AGI.
arXiv Detail & Related papers (2023-11-04T17:44:58Z)
Who to Trust, How and Why: Untangling AI Ethics Principles, Trustworthiness and Trust [0.0]
We argue for the need to distinguish these concepts more clearly. We discuss that trust in AI involves not only reliance on the system itself, but also trust in the developers of the AI system.
arXiv Detail & Related papers (2023-09-19T05:00:34Z)
Designing for Responsible Trust in AI Systems: A Communication Perspective [56.80107647520364]
We draw from communication theories and literature on trust in technologies to develop a conceptual model called MATCH. We highlight transparency and interaction as AI systems' affordances that present a wide range of trustworthiness cues to users. We propose a checklist of requirements to help technology creators identify appropriate cues to use.
arXiv Detail & Related papers (2022-04-29T00:14:33Z)
Trustworthy AI [75.99046162669997]
Brittleness to minor adversarial changes in the input data, ability to explain the decisions, address the bias in their training data, are some of the most prominent limitations. We propose the tutorial on Trustworthy AI to address six critical issues in enhancing user and public trust in AI systems.
arXiv Detail & Related papers (2020-11-02T20:04:18Z)
Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and Goals of Human Trust in AI [55.4046755826066]
We discuss a model of trust inspired by, but not identical to, sociology's interpersonal trust (i.e., trust between people) We incorporate a formalization of 'contractual trust', such that trust between a user and an AI is trust that some implicit or explicit contract will hold. We discuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted.
arXiv Detail & Related papers (2020-10-15T03:07:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.