Principal-Agent Reward Shaping in MDPs
- URL: http://arxiv.org/abs/2401.00298v1
- Date: Sat, 30 Dec 2023 18:30:44 GMT
- Title: Principal-Agent Reward Shaping in MDPs
- Authors: Omer Ben-Porat, Yishay Mansour, Michal Moshkovitz, Boaz Taitler
- Abstract summary: Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest.
We study a two-player Stack game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players.
Our results establish trees and deterministic decision processes with a finite horizon.
- Score: 50.914110302917756
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Principal-agent problems arise when one party acts on behalf of another,
leading to conflicts of interest. The economic literature has extensively
studied principal-agent problems, and recent work has extended this to more
complex scenarios such as Markov Decision Processes (MDPs). In this paper, we
further explore this line of research by investigating how reward shaping under
budget constraints can improve the principal's utility. We study a two-player
Stackelberg game where the principal and the agent have different reward
functions, and the agent chooses an MDP policy for both players. The principal
offers an additional reward to the agent, and the agent picks their policy
selfishly to maximize their reward, which is the sum of the original and the
offered reward. Our results establish the NP-hardness of the problem and offer
polynomial approximation algorithms for two classes of instances: Stochastic
trees and deterministic decision processes with a finite horizon.
Related papers
- Incentivized Learning in Principal-Agent Bandit Games [62.41639598376539]
This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent.
The principal can influence the agent's decisions by offering incentives which add up to his rewards.
We present nearly optimal learning algorithms for the principal's regret in both multi-armed and linear contextual settings.
arXiv Detail & Related papers (2024-03-06T16:00:46Z) - Approximate Linear Programming for Decentralized Policy Iteration in Cooperative Multi-agent Markov Decision Processes [5.842054972839244]
We consider a cooperative multi-agent Markov decision process involving m agents.
In the policy iteration process of multi-agent setup, the number of actions grows exponentially with the number of agents.
We propose approximate decentralized policy iteration algorithms using approximate linear programming with function approximation.
arXiv Detail & Related papers (2023-11-20T14:14:13Z) - Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden
Rewards [4.742123770879715]
In practice, incentive providers often cannot observe the reward realizations of incentivized agents.
This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal.
We introduce an estimator whose only input is the history of principal's incentives and agent's choices.
arXiv Detail & Related papers (2023-08-13T08:12:01Z) - On the Complexity of Multi-Agent Decision Making: From Learning in Games
to Partial Monitoring [105.13668993076801]
A central problem in the theory of multi-agent reinforcement learning (MARL) is to understand what structural conditions and algorithmic principles lead to sample-efficient learning guarantees.
We study this question in a general framework for interactive decision making with multiple agents.
We show that characterizing the statistical complexity for multi-agent decision making is equivalent to characterizing the statistical complexity of single-agent decision making.
arXiv Detail & Related papers (2023-05-01T06:46:22Z) - Repeated Principal-Agent Games with Unobserved Agent Rewards and
Perfect-Knowledge Agents [5.773269033551628]
We study a scenario of repeated principal-agent games within a multi-armed bandit (MAB) framework.
We design our policy by first constructing an estimator for the agent's expected reward for each bandit arm.
We conclude with numerical simulations demonstrating the applicability of our policy to real-life setting from collaborative transportation planning.
arXiv Detail & Related papers (2023-04-14T21:57:16Z) - Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Consequences of Misaligned AI [12.879600368339393]
This paper argues that we should view the design of reward functions as an interactive and dynamic process.
We show how modifying the setup to allow reward functions that reference the full state or allowing the principal to update the proxy objective over time can lead to higher utility solutions.
arXiv Detail & Related papers (2021-02-07T19:34:04Z) - Scalable Multi-Agent Inverse Reinforcement Learning via
Actor-Attention-Critic [54.2180984002807]
Multi-agent adversarial inverse reinforcement learning (MA-AIRL) is a recent approach that applies single-agent AIRL to multi-agent problems.
We propose a multi-agent inverse RL algorithm that is more sample-efficient and scalable than previous works.
arXiv Detail & Related papers (2020-02-24T20:30:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.