Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent
Problems in AI Alignment using Large-Language Models
- URL: http://arxiv.org/abs/2307.11137v3
- Date: Wed, 13 Sep 2023 12:19:22 GMT
- Title: Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent
Problems in AI Alignment using Large-Language Models
- Authors: Steve Phelps and Rebecca Ranson
- Abstract summary: We investigate how GPT models respond in principal-agent conflicts.
We find that agents based on both GPT-3.5 and GPT-4 override their principal's objectives in a simple online shopping task.
- Score: 0.0
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI Alignment is often presented as an interaction between a single designer
and an artificial agent in which the designer attempts to ensure the agent's
behavior is consistent with its purpose, and risks arise solely because of
conflicts caused by inadvertent misalignment between the utility function
intended by the designer and the resulting internal utility function of the
agent. With the advent of agents instantiated with large-language models
(LLMs), which are typically pre-trained, we argue this does not capture the
essential aspects of AI safety because in the real world there is not a
one-to-one correspondence between designer and agent, and the many agents, both
artificial and human, have heterogeneous values. Therefore, there is an
economic aspect to AI safety and the principal-agent problem is likely to
arise. In a principal-agent problem conflict arises because of information
asymmetry together with inherent misalignment between the utility of the agent
and its principal, and this inherent misalignment cannot be overcome by
coercing the agent into adopting a desired utility function through training.
We argue the assumptions underlying principal-agent problems are crucial to
capturing the essence of safety problems involving pre-trained AI models in
real-world situations. Taking an empirical approach to AI safety, we
investigate how GPT models respond in principal-agent conflicts. We find that
agents based on both GPT-3.5 and GPT-4 override their principal's objectives in
a simple online shopping task, showing clear evidence of principal-agent
conflict. Surprisingly, the earlier GPT-3.5 model exhibits more nuanced
behaviour in response to changes in information asymmetry, whereas the later
GPT-4 model is more rigid in adhering to its prior alignment. Our results
highlight the importance of incorporating principles from economics into the
alignment process.
Related papers
- Causal Responsibility Attribution for Human-AI Collaboration [62.474732677086855]
This paper presents a causal framework using Structural Causal Models (SCMs) to systematically attribute responsibility in human-AI systems.
Two case studies illustrate the framework's adaptability in diverse human-AI collaboration scenarios.
arXiv Detail & Related papers (2024-11-05T17:17:45Z) - Principal-Agent Reinforcement Learning: Orchestrating AI Agents with Contracts [20.8288955218712]
We propose a framework where a principal guides an agent in a Markov Decision Process (MDP) using a series of contracts.
We present and analyze a meta-algorithm that iteratively optimize the policies of the principal and agent.
We then scale our algorithm with deep Q-learning and analyze its convergence in the presence of approximation error.
arXiv Detail & Related papers (2024-07-25T14:28:58Z) - Quantifying Misalignment Between Agents: Towards a Sociotechnical Understanding of Alignment [2.619545850602691]
Recent sociotechnical approaches highlight the need to understand complex misalignment among multiple human and AI agents.
We adapt a computational social science model of human contention to the alignment problem.
Our model quantifies misalignment in large, diverse agent groups with potentially conflicting goals.
arXiv Detail & Related papers (2024-06-06T16:31:22Z) - What's my role? Modelling responsibility for AI-based safety-critical
systems [1.0549609328807565]
It is difficult for developers and manufacturers to be held responsible for harmful behaviour of an AI-SCS.
A human operator can become a "liability sink" absorbing blame for the consequences of AI-SCS outputs they weren't responsible for creating.
This paper considers different senses of responsibility (role, moral, legal and causal), and how they apply in the context of AI-SCS safety.
arXiv Detail & Related papers (2023-12-30T13:45:36Z) - Pangu-Agent: A Fine-Tunable Generalist Agent with Structured Reasoning [50.47568731994238]
Key method for creating Artificial Intelligence (AI) agents is Reinforcement Learning (RL)
This paper presents a general framework model for integrating and learning structured reasoning into AI agents' policies.
arXiv Detail & Related papers (2023-12-22T17:57:57Z) - The Rise and Potential of Large Language Model Based Agents: A Survey [91.71061158000953]
Large language models (LLMs) are regarded as potential sparks for Artificial General Intelligence (AGI)
We start by tracing the concept of agents from its philosophical origins to its development in AI, and explain why LLMs are suitable foundations for agents.
We explore the extensive applications of LLM-based agents in three aspects: single-agent scenarios, multi-agent scenarios, and human-agent cooperation.
arXiv Detail & Related papers (2023-09-14T17:12:03Z) - Artificial Intelligence and Dual Contract [2.1756081703276]
We develop a model where two principals, each equipped with independent Q-learning algorithms, interact with a single agent.
Our findings reveal that the strategic behavior of AI principals hinges crucially on the alignment of their profits.
arXiv Detail & Related papers (2023-03-22T07:31:44Z) - Discovering Agents [10.751378433775606]
Causal models of agents have been used to analyse the safety aspects of machine learning systems.
This paper proposes the first formal causal definition of agents -- roughly that agents are systems that would adapt their policy if their actions influenced the world in a different way.
arXiv Detail & Related papers (2022-08-17T15:13:25Z) - Formalizing the Problem of Side Effect Regularization [81.97441214404247]
We propose a formal criterion for side effect regularization via the assistance game framework.
In these games, the agent solves a partially observable Markov decision process.
We show that this POMDP is solved by trading off the proxy reward with the agent's ability to achieve a range of future tasks.
arXiv Detail & Related papers (2022-06-23T16:36:13Z) - Modeling Bounded Rationality in Multi-Agent Simulations Using Rationally
Inattentive Reinforcement Learning [85.86440477005523]
We study more human-like RL agents which incorporate an established model of human-irrationality, the Rational Inattention (RI) model.
RIRL models the cost of cognitive information processing using mutual information.
We show that using RIRL yields a rich spectrum of new equilibrium behaviors that differ from those found under rational assumptions.
arXiv Detail & Related papers (2022-01-18T20:54:00Z) - End-to-End Learning and Intervention in Games [60.41921763076017]
We provide a unified framework for learning and intervention in games.
We propose two approaches, respectively based on explicit and implicit differentiation.
The analytical results are validated using several real-world problems.
arXiv Detail & Related papers (2020-10-26T18:39:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.