Consequences of Misaligned AI
- URL: http://arxiv.org/abs/2102.03896v1
- Date: Sun, 7 Feb 2021 19:34:04 GMT
- Title: Consequences of Misaligned AI
- Authors: Simon Zhuang, Dylan Hadfield-Menell
- Abstract summary: This paper argues that we should view the design of reward functions as an interactive and dynamic process.
We show how modifying the setup to allow reward functions that reference the full state or allowing the principal to update the proxy objective over time can lead to higher utility solutions.
- Score: 12.879600368339393
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: AI systems often rely on two key components: a specified goal or reward
function and an optimization algorithm to compute the optimal behavior for that
goal. This approach is intended to provide value for a principal: the user on
whose behalf the agent acts. The objectives given to these agents often refer
to a partial specification of the principal's goals. We consider the cost of
this incompleteness by analyzing a model of a principal and an agent in a
resource constrained world where the $L$ attributes of the state correspond to
different sources of utility for the principal. We assume that the reward
function given to the agent only has support on $J < L$ attributes. The
contributions of our paper are as follows: 1) we propose a novel model of an
incomplete principal-agent problem from artificial intelligence; 2) we provide
necessary and sufficient conditions under which indefinitely optimizing for any
incomplete proxy objective leads to arbitrarily low overall utility; and 3) we
show how modifying the setup to allow reward functions that reference the full
state or allowing the principal to update the proxy objective over time can
lead to higher utility solutions. The results in this paper argue that we
should view the design of reward functions as an interactive and dynamic
process and identifies a theoretical scenario where some degree of
interactivity is desirable.
Related papers
- Principal-Agent Reinforcement Learning: Orchestrating AI Agents with Contracts [20.8288955218712]
We propose a framework where a principal guides an agent in a Markov Decision Process (MDP) using a series of contracts.
We present and analyze a meta-algorithm that iteratively optimize the policies of the principal and agent.
We then scale our algorithm with deep Q-learning and analyze its convergence in the presence of approximation error.
arXiv Detail & Related papers (2024-07-25T14:28:58Z) - Principal-Agent Reward Shaping in MDPs [50.914110302917756]
Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest.
We study a two-player Stack game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players.
Our results establish trees and deterministic decision processes with a finite horizon.
arXiv Detail & Related papers (2023-12-30T18:30:44Z) - REBEL: A Regularization-Based Solution for Reward Overoptimization in Robotic Reinforcement Learning from Human Feedback [61.54791065013767]
A misalignment between the reward function and user intentions, values, or social norms can be catastrophic in the real world.
Current methods to mitigate this misalignment work by learning reward functions from human preferences.
We propose a novel concept of reward regularization within the robotic RLHF framework.
arXiv Detail & Related papers (2023-12-22T04:56:37Z) - Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden
Rewards [4.742123770879715]
In practice, incentive providers often cannot observe the reward realizations of incentivized agents.
This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal.
We introduce an estimator whose only input is the history of principal's incentives and agent's choices.
arXiv Detail & Related papers (2023-08-13T08:12:01Z) - Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z) - Towards a more efficient computation of individual attribute and policy
contribution for post-hoc explanation of cooperative multi-agent systems
using Myerson values [0.0]
A quantitative assessment of the global importance of an agent in a team is as valuable as gold for strategists, decision-makers, and sports coaches.
We propose a method to determine a Hierarchical Knowledge Graph of agents' policies and features in a Multi-Agent System.
We test the proposed approach in a proof-of-case environment deploying both hardcoded policies and policies obtained via Deep Reinforcement Learning.
arXiv Detail & Related papers (2022-12-06T15:15:00Z) - Should All Proposals be Treated Equally in Object Detection? [110.27485090952385]
The complexity-precision trade-off of an object detector is a critical problem for resource constrained vision tasks.
It is hypothesized that improved detection efficiency requires a paradigm shift, towards the unequal processing of proposals.
This results in better utilization of available computational budget, enabling higher accuracy for the same FLOPS.
arXiv Detail & Related papers (2022-07-07T18:26:32Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z) - VCG Mechanism Design with Unknown Agent Values under Stochastic Bandit
Feedback [104.06766271716774]
We study a multi-round welfare-maximising mechanism design problem in instances where agents do not know their values.
We first define three notions of regret for the welfare, the individual utilities of each agent and that of the mechanism.
Our framework also provides flexibility to control the pricing scheme so as to trade-off between the agent and seller regrets.
arXiv Detail & Related papers (2020-04-19T18:00:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.