Arguments about Highly Reliable Agent Designs as a Useful Path to
Artificial Intelligence Safety
- URL: http://arxiv.org/abs/2201.02950v1
- Date: Sun, 9 Jan 2022 07:42:37 GMT
- Title: Arguments about Highly Reliable Agent Designs as a Useful Path to
Artificial Intelligence Safety
- Authors: Issa Rice, David Manheim
- Abstract summary: Highly Reliable Agent Designs (HRAD) is one of the most controversial and ambitious approaches.
We have titled the arguments (1) incidental utility, (2) deconfusion, (3) precise specification, and (4) prediction.
We have explained the assumptions and claims based on a review of published and informal literature, along with experts who have stated positions on the topic.
- Score: 0.0
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Several different approaches exist for ensuring the safety of future
Transformative Artificial Intelligence (TAI) or Artificial Superintelligence
(ASI) systems, and proponents of different approaches have made different and
debated claims about the importance or usefulness of their work in the near
term, and for future systems. Highly Reliable Agent Designs (HRAD) is one of
the most controversial and ambitious approaches, championed by the Machine
Intelligence Research Institute, among others, and various arguments have been
made about whether and how it reduces risks from future AI systems. In order to
reduce confusion in the debate about AI safety, here we build on a previous
discussion by Rice which collects and presents four central arguments which are
used to justify HRAD as a path towards safety of AI systems.
We have titled the arguments (1) incidental utility,(2) deconfusion, (3)
precise specification, and (4) prediction. Each of these makes different,
partly conflicting claims about how future AI systems can be risky. We have
explained the assumptions and claims based on a review of published and
informal literature, along with consultation with experts who have stated
positions on the topic. Finally, we have briefly outlined arguments against
each approach and against the agenda overall.
Related papers
- Imagining and building wise machines: The centrality of AI metacognition [78.76893632793497]
We argue that shortcomings stem from one overarching failure: AI systems lack wisdom.
While AI research has focused on task-level strategies, metacognition is underdeveloped in AI systems.
We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety.
arXiv Detail & Related papers (2024-11-04T18:10:10Z) - Towards evaluations-based safety cases for AI scheming [37.399946932069746]
We propose three arguments that safety cases could use in relation to scheming.
First, developers of frontier AI systems could argue that AI systems are not capable of scheming.
Second, one could argue that AI systems are not capable of posing harm through scheming.
Third, one could argue that control measures around the AI systems would prevent unacceptable outcomes even if the AI systems intentionally attempted to subvert them.
arXiv Detail & Related papers (2024-10-29T17:55:29Z) - Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress? [59.96471873997733]
We propose an empirical foundation for developing more meaningful safety metrics and define AI safety in a machine learning research context.
We aim to provide a more rigorous framework for AI safety research, advancing the science of safety evaluations and clarifying the path towards measurable progress.
arXiv Detail & Related papers (2024-07-31T17:59:24Z) - Combining AI Control Systems and Human Decision Support via Robustness and Criticality [53.10194953873209]
We extend a methodology for adversarial explanations (AE) to state-of-the-art reinforcement learning frameworks.
We show that the learned AI control system demonstrates robustness against adversarial tampering.
In a training / learning framework, this technology can improve both the AI's decisions and explanations through human interaction.
arXiv Detail & Related papers (2024-07-03T15:38:57Z) - Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems [88.80306881112313]
We will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI.
The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees.
We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them.
arXiv Detail & Related papers (2024-05-10T17:38:32Z) - Artificial Intelligence: Arguments for Catastrophic Risk [0.0]
We review two influential arguments purporting to show how AI could pose catastrophic risks.
The first argument -- the Problem of Power-Seeking -- claims that advanced AI systems are likely to engage in dangerous power-seeking behavior.
The second argument claims that the development of human-level AI will unlock rapid further progress.
arXiv Detail & Related papers (2024-01-27T19:34:13Z) - Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems.
There is a lack of consensus about how exactly such risks arise, and how to manage them.
Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z) - Predictable Artificial Intelligence [77.1127726638209]
This paper introduces the ideas and challenges of Predictable AI.
It explores the ways in which we can anticipate key validity indicators of present and future AI ecosystems.
We argue that achieving predictability is crucial for fostering trust, liability, control, alignment and safety of AI ecosystems.
arXiv Detail & Related papers (2023-10-09T21:36:21Z) - Modeling Transformative AI Risks (MTAIR) Project -- Summary Report [0.0]
This report builds on an earlier diagram by Cottier and Shah which laid out some of the crucial disagreements ("cruxes") visually, with some explanation.
The model starts with a discussion of reasoning via analogies and general prior beliefs about artificial intelligence.
It lays out a model of different paths and enabling technologies for high-level machine intelligence, and a model of how advances in the capabilities of these systems might proceed.
The model also looks specifically at the question of learned optimization, and whether machine learning systems will create mesa-optimizers.
arXiv Detail & Related papers (2022-06-19T09:11:23Z) - X-Risk Analysis for AI Research [24.78742908726579]
We provide a guide for how to analyze AI x-risk.
First, we review how systems can be made safer today.
Next, we discuss strategies for having long-term impacts on the safety of future systems.
arXiv Detail & Related papers (2022-06-13T00:22:50Z) - Transdisciplinary AI Observatory -- Retrospective Analyses and
Future-Oriented Contradistinctions [22.968817032490996]
This paper motivates the need for an inherently transdisciplinary AI observatory approach.
Building on these AI observatory tools, we present near-term transdisciplinary guidelines for AI safety.
arXiv Detail & Related papers (2020-11-26T16:01:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.