Contracting with a Learning Agent
- URL: http://arxiv.org/abs/2401.16198v1
- Date: Mon, 29 Jan 2024 14:53:22 GMT
- Title: Contracting with a Learning Agent
- Authors: Guru Guruganesh, Yoav Kolumbus, Jon Schneider, Inbal Talgam-Cohen,
Emmanouil-Vasileios Vlatakis-Gkaragkounis, Joshua R. Wang, S. Matthew
Weinberg
- Abstract summary: We study the study of repeated contracts with a learning agent, focusing on agents who achieve no-regret outcomes.
We achieve an optimal solution to this problem for a canonical contract setting, in which the agent's choice among multiple actions leads to success/failure.
Our results generalize beyond success/failure, to arbitrary non-linear contracts which the principal rescales dynamically.
- Score: 32.950708673180436
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Many real-life contractual relations differ completely from the clean, static
model at the heart of principal-agent theory. Typically, they involve repeated
strategic interactions of the principal and agent, taking place under
uncertainty and over time. While appealing in theory, players seldom use
complex dynamic strategies in practice, often preferring to circumvent
complexity and approach uncertainty through learning. We initiate the study of
repeated contracts with a learning agent, focusing on agents who achieve
no-regret outcomes.
Optimizing against a no-regret agent is a known open problem in general
games; we achieve an optimal solution to this problem for a canonical contract
setting, in which the agent's choice among multiple actions leads to
success/failure. The solution has a surprisingly simple structure: for some
$\alpha > 0$, initially offer the agent a linear contract with scalar $\alpha$,
then switch to offering a linear contract with scalar $0$. This switch causes
the agent to ``free-fall'' through their action space and during this time
provides the principal with non-zero reward at zero cost. Despite apparent
exploitation of the agent, this dynamic contract can leave \emph{both} players
better off compared to the best static contract. Our results generalize beyond
success/failure, to arbitrary non-linear contracts which the principal rescales
dynamically.
Finally, we quantify the dependence of our results on knowledge of the time
horizon, and are the first to address this consideration in the study of
strategizing against learning agents.
Related papers
- Contractual Reinforcement Learning: Pulling Arms with Invisible Hands [68.77645200579181]
We propose a theoretical framework for aligning economic interests of different stakeholders in the online learning problems through contract design.
For the planning problem, we design an efficient dynamic programming algorithm to determine the optimal contracts against the far-sighted agent.
For the learning problem, we introduce a generic design of no-regret learning algorithms to untangle the challenges from robust design of contracts to the balance of exploration and exploitation.
arXiv Detail & Related papers (2024-07-01T16:53:00Z) - New Perspectives in Online Contract Design [2.296475290901356]
This work studies the repeated principal-agent problem from an online learning perspective.
The principal's goal is to learn the optimal contract that maximizes her utility through repeated interactions.
arXiv Detail & Related papers (2024-03-11T20:28:23Z) - Incentivized Learning in Principal-Agent Bandit Games [62.41639598376539]
This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent.
The principal can influence the agent's decisions by offering incentives which add up to his rewards.
We present nearly optimal learning algorithms for the principal's regret in both multi-armed and linear contextual settings.
arXiv Detail & Related papers (2024-03-06T16:00:46Z) - Impact of Decentralized Learning on Player Utilities in Stackelberg Games [57.08270857260131]
In many two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned.
We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks result in worst-case linear regret for at least one player.
We develop algorithms to achieve near-optimal $O(T2/3)$ regret for both players with respect to these benchmarks.
arXiv Detail & Related papers (2024-02-29T23:38:28Z) - Repeated Contracting with Multiple Non-Myopic Agents: Policy Regret and
Limited Liability [6.512509337399156]
We study a repeated contracting setting in which a Principal adaptively chooses amongst $k$ Agents at each of $T$ rounds.
The Agents are non-myopic, and so a mechanism for the Principal induces a $T$-round extensive form game amongst the Agents.
arXiv Detail & Related papers (2024-02-27T01:01:59Z) - Learning Optimal Contracts: How to Exploit Small Action Spaces [37.92189925462977]
We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme.
We design an algorithm that learns an approximately-optimal contract with high probability.
It can also be employed to provide a $tildemathcalO(T4/5)$ regret bound in the related online learning setting.
arXiv Detail & Related papers (2023-09-18T14:18:35Z) - Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z) - Formal Contracts Mitigate Social Dilemmas in Multi-Agent RL [4.969697978555126]
Multi-agent Reinforcement Learning (MARL) is a powerful tool for training autonomous agents acting independently in a common environment.
MARL can lead to sub-optimal behavior when individual incentives and group incentives diverge.
We propose an augmentation to a Markov game where agents voluntarily agree to binding transfers of reward, under pre-specified conditions.
arXiv Detail & Related papers (2022-08-22T17:42:03Z) - Who Leads and Who Follows in Strategic Classification? [82.44386576129295]
We argue that the order of play in strategic classification is fundamentally determined by the relative frequencies at which the decision-maker and the agents adapt to each other's actions.
We show that a decision-maker with the freedom to choose their update frequency can induce learning dynamics that converge to Stackelberg equilibria with either order of play.
arXiv Detail & Related papers (2021-06-23T16:48:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.