Related papers: Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety

URL: http://arxiv.org/abs/2305.19223v1
Date: Tue, 30 May 2023 17:14:01 GMT
Title: Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety
Authors: Catalin Mitelut, Ben Smith, Peter Vamplew
Abstract summary: We argue that alignment to human intent is insufficient for safe AI systems. We argue that preservation of long-term agency of humans may be a more robust standard.
Score: 2.3572498744567127
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The rapid advancement of artificial intelligence (AI) systems suggests that artificial general intelligence (AGI) systems may soon arrive. Many researchers are concerned that AIs and AGIs will harm humans via intentional misuse (AI-misuse) or through accidents (AI-accidents). In respect of AI-accidents, there is an increasing effort focused on developing algorithms and paradigms that ensure AI systems are aligned to what humans intend, e.g. AI systems that yield actions or recommendations that humans might judge as consistent with their intentions and goals. Here we argue that alignment to human intent is insufficient for safe AI systems and that preservation of long-term agency of humans may be a more robust standard, and one that needs to be separated explicitly and a priori during optimization. We argue that AI systems can reshape human intention and discuss the lack of biological and psychological mechanisms that protect humans from loss of agency. We provide the first formal definition of agency-preserving AI-human interactions which focuses on forward-looking agency evaluations and argue that AI systems - not humans - must be increasingly tasked with making these evaluations. We show how agency loss can occur in simple environments containing embedded agents that use temporal-difference learning to make action recommendations. Finally, we propose a new area of research called "agency foundations" and pose four initial topics designed to improve our understanding of agency in AI-human interactions: benevolent game theory, algorithmic foundations of human rights, mechanistic interpretability of agency representation in neural-networks and reinforcement learning from internal states.

Related papers

Neurodivergent Influenceability as a Contingent Solution to the AI Alignment Problem [1.3905735045377272]
The AI alignment problem, which focusses on ensuring that artificial intelligence (AI) systems act according to human values, presents profound challenges.<n>With the progression from narrow AI to Artificial General Intelligence (AGI) and Superintelligence, fears about control and existential risk have escalated.<n>Here, we investigate whether embracing inevitable AI misalignment can be a contingent strategy to foster a dynamic ecosystem of competing agents.
arXiv Detail & Related papers (2025-05-05T11:33:18Z)
AI Automatons: AI Systems Intended to Imitate Humans [54.19152688545896]
There is a growing proliferation of AI systems designed to mimic people's behavior, work, abilities, likenesses, or humanness. The research, design, deployment, and availability of such AI systems have prompted growing concerns about a wide range of possible legal, ethical, and other social impacts.
arXiv Detail & Related papers (2025-03-04T03:55:38Z)
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path? [37.13209023718946]
Unchecked AI agency poses significant risks to public safety and security. We discuss how these risks arise from current AI training methods. We propose a core building block for further advances the development of a non-agentic AI system.
arXiv Detail & Related papers (2025-02-21T18:28:36Z)
Agency in Artificial Intelligence Systems [0.0]
There is a general concern that present developments in artificial intelligence (AI) research will lead to sentient AI systems. But why cannot sentient AI systems benefit humanity instead? I ask whether a putative AI system will develop an altruistic or a malicious disposition towards our society, or what would be the nature of its agency.
arXiv Detail & Related papers (2025-02-09T02:21:14Z)
Building Symbiotic AI: Reviewing the AI Act for a Human-Centred, Principle-Based Framework [3.723174617224632]
The European Union has released a new legal framework, the AI Act, to regulate AI. At the same time, researchers offer a new perspective on AI systems, commonly known as Human-Centred AI (HCAI) This article aims to identify principles that characterise the design and development of Symbiotic AI systems.
arXiv Detail & Related papers (2025-01-14T11:53:10Z)
Aligning Generalisation Between Humans and Machines [74.120848518198]
AI technology can support humans in scientific discovery and forming decisions, but may also disrupt democracies and target individuals.<n>The responsible use of AI and its participation in human-AI teams increasingly shows the need for AI alignment.<n>A crucial yet often overlooked aspect of these interactions is the different ways in which humans and machines generalise.
arXiv Detail & Related papers (2024-11-23T18:36:07Z)
Imagining and building wise machines: The centrality of AI metacognition [78.76893632793497]
We argue that shortcomings stem from one overarching failure: AI systems lack wisdom. While AI research has focused on task-level strategies, metacognition is underdeveloped in AI systems. We propose that integrating metacognitive capabilities into AI systems is crucial for enhancing their robustness, explainability, cooperation, and safety.
arXiv Detail & Related papers (2024-11-04T18:10:10Z)
Rolling in the deep of cognitive and AI biases [1.556153237434314]
We argue that there is urgent need to understand AI as a sociotechnical system, inseparable from the conditions in which it is designed, developed and deployed. We address this critical issue by following a radical new methodology under which human cognitive biases become core entities in our AI fairness overview. We introduce a new mapping, which justifies the humans to AI biases and we detect relevant fairness intensities and inter-dependencies.
arXiv Detail & Related papers (2024-07-30T21:34:04Z)
Position Paper: Agent AI Towards a Holistic Intelligence [53.35971598180146]
We emphasize developing Agent AI -- an embodied system that integrates large foundation models into agent actions. In this paper, we propose a novel large action model to achieve embodied intelligent behavior, the Agent Foundation Model.
arXiv Detail & Related papers (2024-02-28T16:09:56Z)
Managing extreme AI risks amid rapid progress [171.05448842016125]
We describe risks that include large-scale social harms, malicious uses, and irreversible loss of human control over autonomous AI systems. There is a lack of consensus about how exactly such risks arise, and how to manage them. Present governance initiatives lack the mechanisms and institutions to prevent misuse and recklessness, and barely address autonomous systems.
arXiv Detail & Related papers (2023-10-26T17:59:06Z)
Applying HCAI in developing effective human-AI teaming: A perspective from human-AI joint cognitive systems [10.746728034149989]
Research and application have used human-AI teaming (HAT) as a new paradigm to develop AI systems. We elaborate on our proposed conceptual framework of human-AI joint cognitive systems (HAIJCS) We propose a conceptual framework of human-AI joint cognitive systems (HAIJCS) to represent and implement HAT.
arXiv Detail & Related papers (2023-07-08T06:26:38Z)
Fairness in AI and Its Long-Term Implications on Society [68.8204255655161]
We take a closer look at AI fairness and analyze how lack of AI fairness can lead to deepening of biases over time. We discuss how biased models can lead to more negative real-world outcomes for certain groups. If the issues persist, they could be reinforced by interactions with other risks and have severe implications on society in the form of social unrest.
arXiv Detail & Related papers (2023-04-16T11:22:59Z)
BIASeD: Bringing Irrationality into Automated System Design [12.754146668390828]
We claim that the future of human-machine collaboration will entail the development of AI systems that model, understand and possibly replicate human cognitive biases. We categorize existing cognitive biases from the perspective of AI systems, identify three broad areas of interest and outline research directions for the design of AI systems that have a better understanding of our own biases.
arXiv Detail & Related papers (2022-10-01T02:52:38Z)
Cybertrust: From Explainable to Actionable and Interpretable AI (AI2) [58.981120701284816]
Actionable and Interpretable AI (AI2) will incorporate explicit quantifications and visualizations of user confidence in AI recommendations. It will allow examining and testing of AI system predictions to establish a basis for trust in the systems' decision making.
arXiv Detail & Related papers (2022-01-26T18:53:09Z)
Meaningful human control over AI systems: beyond talking the talk [8.351027101823705]
We identify four properties which AI-based systems must have to be under meaningful human control. First, a system in which humans and AI algorithms interact should have an explicitly defined domain of morally loaded situations. Second, humans and AI agents within the system should have appropriate and mutually compatible representations. Third, responsibility attributed to a human should be commensurate with that human's ability and authority to control the system.
arXiv Detail & Related papers (2021-11-25T11:05:37Z)
Trustworthy AI: A Computational Perspective [54.80482955088197]
We focus on six of the most crucial dimensions in achieving trustworthy AI: (i) Safety & Robustness, (ii) Non-discrimination & Fairness, (iii) Explainability, (iv) Privacy, (v) Accountability & Auditability, and (vi) Environmental Well-Being. For each dimension, we review the recent related technologies according to a taxonomy and summarize their applications in real-world systems.
arXiv Detail & Related papers (2021-07-12T14:21:46Z)
Socially Responsible AI Algorithms: Issues, Purposes, and Challenges [31.382000425295885]
Technologists and AI researchers have a responsibility to develop trustworthy AI systems. To build long-lasting trust between AI and human beings, we argue that the key is to think beyond algorithmic fairness.
arXiv Detail & Related papers (2021-01-01T17:34:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.