Related papers: Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem

Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem

URL: http://arxiv.org/abs/2505.02581v3
Date: Thu, 15 May 2025 01:23:57 GMT
Title: Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem
Authors: Alberto Hernández-Espinosa, Felipe S. Abrahão, Olaf Witkowski, Hector Zenil,
Abstract summary: The AI alignment problem, which focusses on ensuring that artificial intelligence (AI) systems act according to human values, presents profound challenges.<n>With the progression from narrow AI to Artificial General Intelligence (AGI) and Superintelligence, fears about control and existential risk have escalated.<n>Here, we investigate whether embracing inevitable AI misalignment can be a contingent strategy to foster a dynamic ecosystem of competing agents.
Score: 1.3905735045377272
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The AI alignment problem, which focusses on ensuring that artificial intelligence (AI), including AGI and ASI, systems act according to human values, presents profound challenges. With the progression from narrow AI to Artificial General Intelligence (AGI) and Superintelligence, fears about control and existential risk have escalated. Here, we investigate whether embracing inevitable AI misalignment can be a contingent strategy to foster a dynamic ecosystem of competing agents as a viable path to steer them in more human-aligned trends and mitigate risks. We explore how misalignment may serve and should be promoted as a counterbalancing mechanism to team up with whichever agents are most aligned to human interests, ensuring that no single system dominates destructively. The main premise of our contribution is that misalignment is inevitable because full AI-human alignment is a mathematical impossibility from Turing-complete systems, which we also offer as a proof in this contribution, a feature then inherited to AGI and ASI systems. We introduce a change-of-opinion attack test based on perturbation and intervention analysis to study how humans and agents may change or neutralise friendly and unfriendly AIs through cooperation and competition. We show that open models are more diverse and that most likely guardrails implemented in proprietary models are successful at controlling some of the agents' range of behaviour with positive and negative consequences while closed systems are more steerable and can also be used against proprietary AI systems. We also show that human and AI intervention has different effects hence suggesting multiple strategies.

Related papers

When Autonomy Goes Rogue: Preparing for Risks of Multi-Agent Collusion in Social Systems [78.04679174291329]
We introduce a proof-of-concept to simulate the risks of malicious multi-agent systems (MAS)<n>We apply this framework to two high-risk fields: misinformation spread and e-commerce fraud.<n>Our findings show that decentralized systems are more effective at carrying out malicious actions than centralized ones.
arXiv Detail & Related papers (2025-07-19T15:17:30Z)
AI Automatons: AI Systems Intended to Imitate Humans [54.19152688545896]
There is a growing proliferation of AI systems designed to mimic people's behavior, work, abilities, likenesses, or humanness.<n>The research, design, deployment, and availability of such AI systems have prompted growing concerns about a wide range of possible legal, ethical, and other social impacts.
arXiv Detail & Related papers (2025-03-04T03:55:38Z)
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? [37.13209023718946]
Unchecked AI agency poses significant risks to public safety and security.<n>We discuss how these risks arise from current AI training methods.<n>We propose a core building block for further advances the development of a non-agentic AI system.
arXiv Detail & Related papers (2025-02-21T18:28:36Z)
Imagining and building wise machines: The centrality of AI metacognition [78.76893632793497]
We argue that shortcomings stem from one overarching failure: AI systems lack wisdom. While AI research has focused on task-level strategies, metacognition is underdeveloped in AI systems. We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety.
arXiv Detail & Related papers (2024-11-04T18:10:10Z)
Using AI Alignment Theory to understand the potential pitfalls of regulatory frameworks [55.2480439325792]
This paper critically examines the European Union's Artificial Intelligence Act (EU AI Act) Uses insights from Alignment Theory (AT) research, which focuses on the potential pitfalls of technical alignment in Artificial Intelligence. As we apply these concepts to the EU AI Act, we uncover potential vulnerabilities and areas for improvement in the regulation.
arXiv Detail & Related papers (2024-10-10T17:38:38Z)
Rolling in the deep of cognitive and AI biases [1.556153237434314]
We argue that there is urgent need to understand AI as a sociotechnical system, inseparable from the conditions in which it is designed, developed and deployed. We address this critical issue by following a radical new methodology under which human cognitive biases become core entities in our AI fairness overview. We introduce a new mapping, which justifies the humans to AI biases and we detect relevant fairness intensities and inter-dependencies.
arXiv Detail & Related papers (2024-07-30T21:34:04Z)
Societal Adaptation to Advanced AI [1.2607853680700076]
Existing strategies for managing risks from advanced AI systems often focus on affecting what AI systems are developed and how they diffuse.<n>We urge a complementary approach: increasing societal adaptation to advanced AI.<n>We introduce a conceptual framework which helps identify adaptive interventions that avoid, defend against and remedy potentially harmful uses of AI systems.
arXiv Detail & Related papers (2024-05-16T17:52:12Z)
AI Alignment: A Comprehensive Survey [69.61425542486275]
AI alignment aims to make AI systems behave in line with human intentions and values.<n>We identify four principles as the key objectives of AI alignment: Robustness, Interpretability, Controllability, and Ethicality.<n>We decompose current alignment research into two key components: forward alignment and backward alignment.
arXiv Detail & Related papers (2023-10-30T15:52:15Z)
Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems. There is a lack of consensus about how exactly such risks arise, and how to manage them. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z)
Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety [2.3572498744567127]
We argue that alignment to human intent is insufficient for safe AI systems. We argue that preservation of long-term agency of humans may be a more robust standard.
arXiv Detail & Related papers (2023-05-30T17:14:01Z)
Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time. We discuss how biased models can lead to more negative real-world outcomes for certain groups. If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z)
Examining the Differential Risk from High-level Artificial Intelligence and the Question of Control [0.0]
The extent and scope of future AI capabilities remain a key uncertainty. There are concerns over the extent of integration and oversight of AI opaque decision processes. This study presents a hierarchical complex systems framework to model AI risk and provide a template for alternative futures analysis.
arXiv Detail & Related papers (2022-11-06T15:46:02Z)
Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations. It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z)
The Feasibility and Inevitability of Stealth Attacks [63.14766152741211]
We study new adversarial perturbations that enable an attacker to gain control over decisions in generic Artificial Intelligence systems. In contrast to adversarial data modification, the attack mechanism we consider here involves alterations to the AI system itself.
arXiv Detail & Related papers (2021-06-26T10:50:07Z)
Adversarial Interaction Attack: Fooling AI to Misinterpret Human Intentions [46.87576410532481]
We show that, despite their current huge success, deep learning based AI systems can be easily fooled by subtle adversarial noise. Based on a case study of skeleton-based human interactions, we propose a novel adversarial attack on interactions. Our study highlights potential risks in the interaction loop with AI and humans, which need to be carefully addressed when deploying AI systems in safety-critical applications.
arXiv Detail & Related papers (2021-01-17T16:23:20Z)

This list is automatically generated from the titles and abstracts of the papers in this site.