Achilles Heels for AGI/ASI via Decision Theoretic Adversaries
- URL: http://arxiv.org/abs/2010.05418v9
- Date: Sun, 2 Apr 2023 03:20:17 GMT
- Title: Achilles Heels for AGI/ASI via Decision Theoretic Adversaries
- Authors: Stephen Casper
- Abstract summary: It is important to know how advanced systems will make choices and in what ways they may fail.
One might suspect that artificially generally intelligent (AGI) and artificially superintelligent (ASI) will be systems that humans cannot reliably outsmart.
This paper presents the Achilles Heel hypothesis which states that even a potentially superintelligent system may nonetheless have stable decision-theoretic delusions.
- Score: 0.9790236766474201
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As progress in AI continues to advance, it is important to know how advanced
systems will make choices and in what ways they may fail. Machines can already
outsmart humans in some domains, and understanding how to safely build ones
which may have capabilities at or above the human level is of particular
concern. One might suspect that artificially generally intelligent (AGI) and
artificially superintelligent (ASI) will be systems that humans cannot reliably
outsmart. As a challenge to this assumption, this paper presents the Achilles
Heel hypothesis which states that even a potentially superintelligent system
may nonetheless have stable decision-theoretic delusions which cause them to
make irrational decisions in adversarial settings. In a survey of key dilemmas
and paradoxes from the decision theory literature, a number of these potential
Achilles Heels are discussed in context of this hypothesis. Several novel
contributions are made toward understanding the ways in which these weaknesses
might be implanted into a system.
Related papers
- Imagining and building wise machines: The centrality of AI metacognition [78.76893632793497]
We argue that shortcomings stem from one overarching failure: AI systems lack wisdom.
While AI research has focused on task-level strategies, metacognition is underdeveloped in AI systems.
We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety.
arXiv Detail & Related papers (2024-11-04T18:10:10Z) - On the consistent reasoning paradox of intelligence and optimal trust in AI: The power of 'I don't know' [79.69412622010249]
Consistent reasoning, which lies at the core of human intelligence, is the ability to handle tasks that are equivalent.
CRP asserts that consistent reasoning implies fallibility -- in particular, human-like intelligence in AI necessarily comes with human-like fallibility.
arXiv Detail & Related papers (2024-08-05T10:06:53Z) - Artificial Intelligence: Arguments for Catastrophic Risk [0.0]
We review two influential arguments purporting to show how AI could pose catastrophic risks.
The first argument -- the Problem of Power-Seeking -- claims that advanced AI systems are likely to engage in dangerous power-seeking behavior.
The second argument claims that the development of human-level AI will unlock rapid further progress.
arXiv Detail & Related papers (2024-01-27T19:34:13Z) - Brain-Inspired Computational Intelligence via Predictive Coding [89.6335791546526]
Predictive coding (PC) has shown promising performance in machine intelligence tasks.
PC can model information processing in different brain areas, can be used in cognitive control and robotics.
arXiv Detail & Related papers (2023-08-15T16:37:16Z) - Understanding Natural Language Understanding Systems. A Critical
Analysis [91.81211519327161]
The development of machines that guillemotlefttalk like usguillemotright, also known as Natural Language Understanding (NLU) systems, is the Holy Grail of Artificial Intelligence (AI)
But never has the trust that we can build guillemotlefttalking machinesguillemotright been stronger than the one engendered by the last generation of NLU systems.
Are we at the dawn of a new era, in which the Grail is finally closer to us?
arXiv Detail & Related papers (2023-03-01T08:32:55Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z) - Scope and Sense of Explainability for AI-Systems [0.0]
Emphasis will be given to difficulties related to the explainability of highly complex and efficient AI systems.
It will be elaborated on arguments supporting the notion that if AI-solutions were to be discarded in advance because of their not being thoroughly comprehensible, a great deal of the potentiality of intelligent systems would be wasted.
arXiv Detail & Related papers (2021-12-20T14:25:05Z) - An argument for the impossibility of machine intelligence [0.0]
We define what it is to be an agent (device) that could be the bearer of AI.
We show that the mainstream definitions of intelligence' are too weak even to capture what is involved when we ascribe intelligence to an insect.
We identify the properties that an AI agent would need to possess in order to be the bearer of intelligence by this definition.
arXiv Detail & Related papers (2021-10-20T08:54:48Z) - Inductive Biases for Deep Learning of Higher-Level Cognition [108.89281493851358]
A fascinating hypothesis is that human and animal intelligence could be explained by a few principles.
This work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing.
The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans' abilities.
arXiv Detail & Related papers (2020-11-30T18:29:25Z) - Dynamic Cognition Applied to Value Learning in Artificial Intelligence [0.0]
Several researchers in the area are trying to develop a robust, beneficial, and safe concept of artificial intelligence.
It is of utmost importance that artificial intelligent agents have their values aligned with human values.
A possible approach to this problem would be to use theoretical models such as SED.
arXiv Detail & Related papers (2020-05-12T03:58:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.