Incentivized Learning in Principal-Agent Bandit Games
- URL: http://arxiv.org/abs/2403.03811v1
- Date: Wed, 6 Mar 2024 16:00:46 GMT
- Title: Incentivized Learning in Principal-Agent Bandit Games
- Authors: Antoine Scheid, Daniil Tiapkin, Etienne Boursier, Aymeric Capitaine,
El Mahdi El Mhamdi, Eric Moulines, Michael I. Jordan, Alain Durmus
- Abstract summary: This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent.
The principal can influence the agent's decisions by offering incentives which add up to his rewards.
We present nearly optimal learning algorithms for the principal's regret in both multi-armed and linear contextual settings.
- Score: 62.41639598376539
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This work considers a repeated principal-agent bandit game, where the
principal can only interact with her environment through the agent. The
principal and the agent have misaligned objectives and the choice of action is
only left to the agent. However, the principal can influence the agent's
decisions by offering incentives which add up to his rewards. The principal
aims to iteratively learn an incentive policy to maximize her own total
utility. This framework extends usual bandit problems and is motivated by
several practical applications, such as healthcare or ecological taxation,
where traditionally used mechanism design theories often overlook the learning
aspect of the problem. We present nearly optimal (with respect to a horizon
$T$) learning algorithms for the principal's regret in both multi-armed and
linear contextual settings. Finally, we support our theoretical guarantees
through numerical experiments.
Related papers
- Exploration and Persuasion [58.87314871998078]
We show how to incentivize self-interested agents to explore when they prefer to exploit.
Consider a population of self-interested agents that make decisions under uncertainty.
They "explore" to acquire new information and "exploit" this information to make good decisions.
This is because exploration is costly, but its benefits are spread over many agents in the future.
arXiv Detail & Related papers (2024-10-22T15:13:13Z) - Contracting with a Learning Agent [32.950708673180436]
We study the study of repeated contracts with a learning agent, focusing on agents who achieve no-regret outcomes.
We achieve an optimal solution to this problem for a canonical contract setting, in which the agent's choice among multiple actions leads to success/failure.
Our results generalize beyond success/failure, to arbitrary non-linear contracts which the principal rescales dynamically.
arXiv Detail & Related papers (2024-01-29T14:53:22Z) - Principal-Agent Reward Shaping in MDPs [50.914110302917756]
Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest.
We study a two-player Stack game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players.
Our results establish trees and deterministic decision processes with a finite horizon.
arXiv Detail & Related papers (2023-12-30T18:30:44Z) - Regret Analysis of Repeated Delegated Choice [8.384985977301174]
We present a study on a repeated delegated choice problem, which is the first to consider an online learning variant of Kleinberg and Kleinberg, EC'18.
We explore two dimensions of the problem setup, whether the agent behaves myopically or strategizes across the rounds, and whether the solutions yield deterministic or utility.
arXiv Detail & Related papers (2023-10-07T17:54:36Z) - Learning Optimal Contracts: How to Exploit Small Action Spaces [37.92189925462977]
We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme.
We design an algorithm that learns an approximately-optimal contract with high probability.
It can also be employed to provide a $tildemathcalO(T4/5)$ regret bound in the related online learning setting.
arXiv Detail & Related papers (2023-09-18T14:18:35Z) - Estimating and Incentivizing Imperfect-Knowledge Agents with Hidden
Rewards [4.742123770879715]
In practice, incentive providers often cannot observe the reward realizations of incentivized agents.
This paper explores a repeated adverse selection game between a self-interested learning agent and a learning principal.
We introduce an estimator whose only input is the history of principal's incentives and agent's choices.
arXiv Detail & Related papers (2023-08-13T08:12:01Z) - Repeated Principal-Agent Games with Unobserved Agent Rewards and
Perfect-Knowledge Agents [5.773269033551628]
We study a scenario of repeated principal-agent games within a multi-armed bandit (MAB) framework.
We design our policy by first constructing an estimator for the agent's expected reward for each bandit arm.
We conclude with numerical simulations demonstrating the applicability of our policy to real-life setting from collaborative transportation planning.
arXiv Detail & Related papers (2023-04-14T21:57:16Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.