Related papers: Extinction Risks from AI: Invisible to Science?

Extinction Risks from AI: Invisible to Science?

URL: http://arxiv.org/abs/2403.05540v1
Date: Fri, 2 Feb 2024 23:04:13 GMT
Title: Extinction Risks from AI: Invisible to Science?
Authors: Vojtech Kovarik, Christian van Merwijk, Ida Mattsson,
Abstract summary: Extinction-level Goodhart's Law is "Virtually any goal specification, pursued to the extreme, will result in the extinction of humanity" This raises the possibility that whether the risk of extinction from artificial intelligence is real or not, the underlying dynamics might be invisible to current scientific methods.
Score: 0.0
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In an effort to inform the discussion surrounding existential risks from AI, we formulate Extinction-level Goodhart's Law as "Virtually any goal specification, pursued to the extreme, will result in the extinction of humanity", and we aim to understand which formal models are suitable for investigating this hypothesis. Note that we remain agnostic as to whether Extinction-level Goodhart's Law holds or not. As our key contribution, we identify a set of conditions that are necessary for a model that aims to be informative for evaluating specific arguments for Extinction-level Goodhart's Law. Since each of the conditions seems to significantly contribute to the complexity of the resulting model, formally evaluating the hypothesis might be exceedingly difficult. This raises the possibility that whether the risk of extinction from artificial intelligence is real or not, the underlying dynamics might be invisible to current scientific methods.

Related papers

Active Inference AI Systems for Scientific Discovery [1.450405446885067]
This perspective contends that progress turns on closing three mutually reinforcing gaps in abstraction, reasoning and empirical grounding.<n>Design principles are proposed for systems that reason in imaginary spaces and learn from the world.
arXiv Detail & Related papers (2025-06-26T14:43:04Z)
A Conjecture on a Fundamental Trade-Off between Certainty and Scope in Symbolic and Generative AI [0.0]
Article introduces a conjecture that formalises a fundamental trade-off between provable correctness and broad data-mapping capacity in AI systems.<n>By making this implicit trade-off explicit and open to rigorous verification, the conjecture significantly reframes both engineering ambitions and philosophical expectations for AI.
arXiv Detail & Related papers (2025-06-11T19:18:13Z)
Controllable Logical Hypothesis Generation for Abductive Reasoning in Knowledge Graphs [54.596180382762036]
Abductive reasoning in knowledge graphs aims to generate plausible logical hypotheses from observed entities.<n>Due to a lack of controllability, a single observation may yield numerous plausible but redundant or irrelevant hypotheses.<n>We introduce the task of controllable hypothesis generation to improve the practical utility of abductive reasoning.
arXiv Detail & Related papers (2025-05-27T09:36:47Z)
When Counterfactual Reasoning Fails: Chaos and Real-World Complexity [1.9223856107206057]
We investigate the limitations of counterfactual reasoning within the framework of Structural Causal Models. We find that realistic assumptions, such as low degrees of model uncertainty or chaotic dynamics, can result in counterintuitive outcomes. This work urges caution when applying counterfactual reasoning in settings characterized by chaos and uncertainty.
arXiv Detail & Related papers (2025-03-31T08:14:51Z)
Hypothesizing Missing Causal Variables with LLMs [55.28678224020973]
We formulate a novel task where the input is a partial causal graph with missing variables, and the output is a hypothesis about the missing variables to complete the partial graph. We show the strong ability of LLMs to hypothesize the mediation variables between a cause and its effect. We also observe surprising results where some of the open-source models outperform the closed GPT-4 model.
arXiv Detail & Related papers (2024-09-04T10:37:44Z)
Can a Bayesian Oracle Prevent Harm from an Agent? [48.12936383352277]
We consider estimating a context-dependent bound on the probability of violating a given safety specification. Noting that different plausible hypotheses about the world could produce very different outcomes, we derive on the safety violation probability predicted under the true but unknown hypothesis. We consider two forms of this result, in the iid case and in the non-iid case, and conclude with open problems towards turning such results into practical AI guardrails.
arXiv Detail & Related papers (2024-08-09T18:10:42Z)
Control Risk for Potential Misuse of Artificial Intelligence in Science [85.91232985405554]
We aim to raise awareness of the dangers of AI misuse in science. We highlight real-world examples of misuse in chemical science. We propose a system called SciGuard to control misuse risks for AI models in science.
arXiv Detail & Related papers (2023-12-11T18:50:57Z)
A Closer Look at the Self-Verification Abilities of Large Language Models in Logical Reasoning [73.77088902676306]
We take a closer look at the self-verification abilities of large language models (LLMs) in the context of logical reasoning. Our main findings suggest that existing LLMs could struggle to identify fallacious reasoning steps accurately and may fall short of guaranteeing the validity of self-verification methods.
arXiv Detail & Related papers (2023-11-14T07:13:10Z)
The Generative AI Paradox: "What It Can Create, It May Not Understand" [81.89252713236746]
Recent wave of generative AI has sparked excitement and concern over potentially superhuman levels of artificial intelligence. At the same time, models still show basic errors in understanding that would not be expected even in non-expert humans. This presents us with an apparent paradox: how do we reconcile seemingly superhuman capabilities with the persistence of errors that few humans would make?
arXiv Detail & Related papers (2023-10-31T18:07:07Z)
Could AI be the Great Filter? What Astrobiology can Teach the Intelligence Community about Anthropogenic Risks [0.0]
The Fermi Paradox is the disquieting idea that, if extraterrestrial life is probable in the Universe, then why have we not encountered it? One intriguing hypothesis is known as the Great Filter, which suggests that some event required for the emergence of intelligent life is extremely unlikely, hence the cosmic silence. From an intelligence perspective, framing global catastrophic risk within the context of the Great Filter can provide insight into the long-term futures of technologies that we don't fully understand, like artificial intelligence.
arXiv Detail & Related papers (2023-05-09T17:50:02Z)
A simplicity bubble problem and zemblanity in digitally intermediated societies [1.4380443010065829]
We discuss the ubiquity of Big Data and machine learning in society. We show that there is a ceiling above which formal knowledge cannot further decrease the probability of zemblanitous findings.
arXiv Detail & Related papers (2023-04-21T00:02:15Z)
Simplified Continuous High Dimensional Belief Space Planning with Adaptive Probabilistic Belief-dependent Constraints [9.061408029414453]
Under uncertainty in partially observable domains, also known as Belief Space Planning, online decision making is a fundamental problem. We present a technique to adaptively accept or discard a candidate action sequence with respect to a probabilistic belief-dependent constraint. We apply our method to active SLAM, a highly challenging problem of high dimensional Belief Space Planning.
arXiv Detail & Related papers (2023-02-13T21:22:47Z)
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment [96.77970239683475]
AI systems need to be able to understand, interpret and predict human moral judgments and decisions. A central challenge for AI safety is capturing the flexibility of the human moral mind. We present a novel challenge set consisting of rule-breaking question answering.
arXiv Detail & Related papers (2022-10-04T09:04:27Z)
Impossibility Results in AI: A Survey [3.198144010381572]
An impossibility theorem demonstrates that a particular problem or set of problems cannot be solved as described in the claim. We have categorized impossibility theorems applicable to the domain of AI into five categories: deduction, indistinguishability, induction, tradeoffs, and intractability. We conclude that deductive impossibilities deny 100%-guarantees for security.
arXiv Detail & Related papers (2021-09-01T16:52:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.