Achilles Heels for AGI/ASI via Decision Theoretic Adversaries
- URL: http://arxiv.org/abs/2010.05418v9
- Date: Sun, 2 Apr 2023 03:20:17 GMT
- Title: Achilles Heels for AGI/ASI via Decision Theoretic Adversaries
- Authors: Stephen Casper
- Abstract summary: It is important to know how advanced systems will make choices and in what ways they may fail.
One might suspect that artificially generally intelligent (AGI) and artificially superintelligent (ASI) will be systems that humans cannot reliably outsmart.
This paper presents the Achilles Heel hypothesis which states that even a potentially superintelligent system may nonetheless have stable decision-theoretic delusions.
- Score: 0.9790236766474201
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: As progress in AI continues to advance, it is important to know how advanced
systems will make choices and in what ways they may fail. Machines can already
outsmart humans in some domains, and understanding how to safely build ones
which may have capabilities at or above the human level is of particular
concern. One might suspect that artificially generally intelligent (AGI) and
artificially superintelligent (ASI) will be systems that humans cannot reliably
outsmart. As a challenge to this assumption, this paper presents the Achilles
Heel hypothesis which states that even a potentially superintelligent system
may nonetheless have stable decision-theoretic delusions which cause them to
make irrational decisions in adversarial settings. In a survey of key dilemmas
and paradoxes from the decision theory literature, a number of these potential
Achilles Heels are discussed in context of this hypothesis. Several novel
contributions are made toward understanding the ways in which these weaknesses
might be implanted into a system.
Related papers
- Artificial Intelligence: Arguments for Catastrophic Risk [0.0]
We review two influential arguments purporting to show how AI could pose catastrophic risks.
The first argument -- the Problem of Power-Seeking -- claims that advanced AI systems are likely to engage in dangerous power-seeking behavior.
The second argument claims that the development of human-level AI will unlock rapid further progress.
arXiv Detail & Related papers (2024-01-27T19:34:13Z) - Enabling High-Level Machine Reasoning with Cognitive Neuro-Symbolic
Systems [67.01132165581667]
We propose to enable high-level reasoning in AI systems by integrating cognitive architectures with external neuro-symbolic components.
We illustrate a hybrid framework centered on ACT-R and we discuss the role of generative models in recent and future applications.
arXiv Detail & Related papers (2023-11-13T21:20:17Z) - Brain-Inspired Computational Intelligence via Predictive Coding [89.6335791546526]
Predictive coding (PC) has shown promising performance in machine intelligence tasks.
PC can model information processing in different brain areas, can be used in cognitive control and robotics.
arXiv Detail & Related papers (2023-08-15T16:37:16Z) - Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time.
We discuss how biased models can lead to more negative real-world outcomes for certain groups.
If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z) - Understanding Natural Language Understanding Systems. A Critical
Analysis [91.81211519327161]
The development of machines that guillemotlefttalk like usguillemotright, also known as Natural Language Understanding (NLU) systems, is the Holy Grail of Artificial Intelligence (AI)
But never has the trust that we can build guillemotlefttalking machinesguillemotright been stronger than the one engendered by the last generation of NLU systems.
Are we at the dawn of a new era, in which the Grail is finally closer to us?
arXiv Detail & Related papers (2023-03-01T08:32:55Z) - Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations.
It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z) - Scope and Sense of Explainability for AI-Systems [0.0]
Emphasis will be given to difficulties related to the explainability of highly complex and efficient AI systems.
It will be elaborated on arguments supporting the notion that if AI-solutions were to be discarded in advance because of their not being thoroughly comprehensible, a great deal of the potentiality of intelligent systems would be wasted.
arXiv Detail & Related papers (2021-12-20T14:25:05Z) - An argument for the impossibility of machine intelligence [0.0]
We define what it is to be an agent (device) that could be the bearer of AI.
We show that the mainstream definitions of intelligence' are too weak even to capture what is involved when we ascribe intelligence to an insect.
We identify the properties that an AI agent would need to possess in order to be the bearer of intelligence by this definition.
arXiv Detail & Related papers (2021-10-20T08:54:48Z) - Inductive Biases for Deep Learning of Higher-Level Cognition [108.89281493851358]
A fascinating hypothesis is that human and animal intelligence could be explained by a few principles.
This work considers a larger list, focusing on those which concern mostly higher-level and sequential conscious processing.
The objective of clarifying these particular principles is that they could potentially help us build AI systems benefiting from humans' abilities.
arXiv Detail & Related papers (2020-11-30T18:29:25Z) - Dynamic Cognition Applied to Value Learning in Artificial Intelligence [0.0]
Several researchers in the area are trying to develop a robust, beneficial, and safe concept of artificial intelligence.
It is of utmost importance that artificial intelligent agents have their values aligned with human values.
A possible approach to this problem would be to use theoretical models such as SED.
arXiv Detail & Related papers (2020-05-12T03:58:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.