Limitations on Safe, Trusted, Artificial General Intelligence
- URL: http://arxiv.org/abs/2509.21654v1
- Date: Thu, 25 Sep 2025 22:16:38 GMT
- Title: Limitations on Safe, Trusted, Artificial General Intelligence
- Authors: Rina Panigrahy, Vatsal Sharan,
- Abstract summary: Safety, trust and Artificial General Intelligence (AGI) are aspirational goals in artificial intelligence (AI) systems.<n>We propose strict, mathematical definitions of safety, trust, and AGI.<n>We show our results for program verification, planning, and graph reachability.
- Score: 14.425238904385074
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Safety, trust and Artificial General Intelligence (AGI) are aspirational goals in artificial intelligence (AI) systems, and there are several informal interpretations of these notions. In this paper, we propose strict, mathematical definitions of safety, trust, and AGI, and demonstrate a fundamental incompatibility between them. We define safety of a system as the property that it never makes any false claims, trust as the assumption that the system is safe, and AGI as the property of an AI system always matching or exceeding human capability. Our core finding is that -- for our formal definitions of these notions -- a safe and trusted AI system cannot be an AGI system: for such a safe, trusted system there are task instances which are easily and provably solvable by a human but not by the system. We note that we consider strict mathematical definitions of safety and trust, and it is possible for real-world deployments to instead rely on alternate, practical interpretations of these notions. We show our results for program verification, planning, and graph reachability. Our proofs draw parallels to G\"odel's incompleteness theorems and Turing's proof of the undecidability of the halting problem, and can be regarded as interpretations of G\"odel's and Turing's results.
Related papers
- Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance [211.5823259429128]
We propose a comprehensive framework integrating technical and societal dimensions, structured around three interconnected pillars: Intrinsic Security, Derivative Security, and Social Ethics.<n>We identify three core challenges: (1) the generalization gap, where defenses fail against evolving threats; (2) inadequate evaluation protocols that overlook real-world risks; and (3) fragmented regulations leading to inconsistent oversight.<n>Our framework offers actionable guidance for researchers, engineers, and policymakers to develop AI systems that are not only robust and secure but also ethically aligned and publicly trustworthy.
arXiv Detail & Related papers (2025-08-12T09:42:56Z) - The Alignment Trap: Complexity Barriers [0.0]
This paper argues that AI alignment is not merely difficult, but is founded on a fundamental logical contradiction.<n>We first establish Theion Paradox: we use machine learning precisely because we cannot enumerate all necessary safety rules.<n>This paradox is then confirmed by a set of five independent mathematical proofs.
arXiv Detail & Related papers (2025-06-12T02:30:30Z) - Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem [1.3905735045377272]
The AI alignment problem, which focusses on ensuring that artificial intelligence (AI) systems act according to human values, presents profound challenges.<n>With the progression from narrow AI to Artificial General Intelligence (AGI) and Superintelligence, fears about control and existential risk have escalated.<n>Here, we investigate whether embracing inevitable AI misalignment can be a contingent strategy to foster a dynamic ecosystem of competing agents.
arXiv Detail & Related papers (2025-05-05T11:33:18Z) - Towards A Litmus Test for Common Sense [5.280511830552275]
This paper is the second in a planned series aimed at envisioning a path to safe and beneficial artificial intelligence.<n>We propose a more formal litmus test for common sense, adopting an axiomatic approach that combines minimal prior knowledge constraints with diagonal or Godel-style arguments.
arXiv Detail & Related papers (2025-01-17T02:02:12Z) - Towards AI-$45^{\circ}$ Law: A Roadmap to Trustworthy AGI [24.414787444128947]
We propose the textitAI-textbf$45circ$ Law as a guiding principle for a balanced roadmap toward trustworthy AGI.<n>This framework provides a systematic taxonomy and hierarchical structure for current AI capability and safety research, inspired by Judea Pearl's Ladder of Causation''
arXiv Detail & Related papers (2024-12-08T14:14:16Z) - Towards evaluations-based safety cases for AI scheming [37.399946932069746]
We propose three arguments that safety cases could use in relation to scheming.
First, developers of frontier AI systems could argue that AI systems are not capable of scheming.
Second, one could argue that AI systems are not capable of posing harm through scheming.
Third, one could argue that control measures around the AI systems would prevent unacceptable outcomes even if the AI systems intentionally attempted to subvert them.
arXiv Detail & Related papers (2024-10-29T17:55:29Z) - Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act)
Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence.
As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z) - Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z) - Provably safe systems: the only path to controllable AGI [0.0]
We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements.
We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability.
arXiv Detail & Related papers (2023-09-05T03:42:46Z) - When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions.
A central challenge for AI safety is capturing the flexibility of the human moral mind.
We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z) - Never trust, always verify : a roadmap for Trustworthy AI? [12.031113181911627]
We examine trust in the context of AI-based systems to understand what it means for an AI system to be trustworthy.
We suggest a trust (resp. zero-trust) model for AI and suggest a set of properties that should be satisfied to ensure the trustworthiness of AI systems.
arXiv Detail & Related papers (2022-06-23T21:13:10Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z) - Formalizing Trust in Artificial Intelligence: Prerequisites, Causes and
Goals of Human Trust in AI [55.4046755826066]
We discuss a model of trust inspired by, but not identical to, sociology's interpersonal trust (i.e., trust between people)
We incorporate a formalization of 'contractual trust', such that trust between a user and an AI is trust that some implicit or explicit contract will hold.
We discuss how to design trustworthy AI, how to evaluate whether trust has manifested, and whether it is warranted.
arXiv Detail & Related papers (2020-10-15T03:07:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.