Provably safe systems: the only path to controllable AGI
- URL: http://arxiv.org/abs/2309.01933v1
- Date: Tue, 5 Sep 2023 03:42:46 GMT
- Title: Provably safe systems: the only path to controllable AGI
- Authors: Max Tegmark (MIT), Steve Omohundro (Beneficial AI Research)
- Abstract summary: We describe a path to humanity safely thriving with powerful Artificial General Intelligences (AGIs) by building them to provably satisfy human-specified requirements.
We argue that this will soon be technically feasible using advanced AI for formal verification and mechanistic interpretability.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We describe a path to humanity safely thriving with powerful Artificial
General Intelligences (AGIs) by building them to provably satisfy
human-specified requirements. We argue that this will soon be technically
feasible using advanced AI for formal verification and mechanistic
interpretability. We further argue that it is the only path which guarantees
safe controlled AGI. We end with a list of challenge problems whose solution
would contribute to this positive outcome and invite readers to join in this
work.
Related papers
- Imagining and building wise machines: The centrality of AI metacognition [78.76893632793497]
We argue that shortcomings stem from one overarching failure: AI systems lack wisdom.
While AI research has focused on task-level strategies, metacognition is underdeveloped in AI systems.
We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety.
arXiv Detail & Related papers (2024-11-04T18:10:10Z) - Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act)
Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence.
As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z) - Monitoring Human Dependence On AI Systems With Reliance Drills [0.0]
Humans could be over-reliant on AI systems if they trust AI-generated advice, even though they would make a better decision on their own.
This paper proposes the reliance drill: an exercise that tests whether a human can recognise mistakes in AI-generated advice.
We argue that reliance drills could become a key tool for ensuring humans remain appropriately involved in AI-assisted decisions.
arXiv Detail & Related papers (2024-09-21T08:09:31Z) - Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z) - AI Safety: Necessary, but insufficient and possibly problematic [1.6797508081737678]
This article critically examines the recent hype around AI safety.
We consider what 'AI safety' actually means, and outline the dominant concepts that the digital footprint of AI safety aligns with.
We share our concerns on how AI safety may normalize AI that advances structural harm through providing exploitative and harmful AI with a veneer of safety.
arXiv Detail & Related papers (2024-03-26T06:18:42Z) - Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review [12.38351931894004]
We present the first systematic literature review of explainable methods for safe and trustworthy autonomous driving.
We identify five key contributions of XAI for safe and trustworthy AI in AD, which are interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation.
We propose a modular framework called SafeX to integrate these contributions, enabling explanation delivery to users while simultaneously ensuring the safety of AI models.
arXiv Detail & Related papers (2024-02-08T09:08:44Z) - When to Make Exceptions: Exploring Language Models as Accounts of Human
Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions.
A central challenge for AI safety is capturing the flexibility of the human moral mind.
We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z) - Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being.
For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z) - On Controllability of AI [1.370633147306388]
We present arguments as well as supporting evidence indicating that advanced AI can't be fully controlled.
Consequences of uncontrollability of AI are discussed with respect to future of humanity and research on AI, and AI safety and security.
arXiv Detail & Related papers (2020-07-19T02:49:41Z) - AI Failures: A Review of Underlying Issues [0.0]
We focus on AI failures on account of flaws in conceptualization, design and deployment.
We find that AI systems fail on account of omission and commission errors in the design of the AI system.
An AI system is quite likely to fail in situations where, in effect, it is called upon to deliver moral judgments.
arXiv Detail & Related papers (2020-07-18T15:31:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.