Repeated Contracting with Multiple Non-Myopic Agents: Policy Regret and
Limited Liability
- URL: http://arxiv.org/abs/2402.17108v1
- Date: Tue, 27 Feb 2024 01:01:59 GMT
- Title: Repeated Contracting with Multiple Non-Myopic Agents: Policy Regret and
Limited Liability
- Authors: Natalie Collina, Varun Gupta, Aaron Roth
- Abstract summary: We study a repeated contracting setting in which a Principal adaptively chooses amongst $k$ Agents at each of $T$ rounds.
The Agents are non-myopic, and so a mechanism for the Principal induces a $T$-round extensive form game amongst the Agents.
- Score: 6.512509337399156
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study a repeated contracting setting in which a Principal adaptively
chooses amongst $k$ Agents at each of $T$ rounds. The Agents are non-myopic,
and so a mechanism for the Principal induces a $T$-round extensive form game
amongst the Agents. We give several results aimed at understanding an
under-explored aspect of contract theory -- the game induced when choosing an
Agent to contract with. First, we show that this game admits a pure-strategy
\emph{non-responsive} equilibrium amongst the Agents -- informally an
equilibrium in which the Agent's actions depend on the history of realized
states of nature, but not on the history of each other's actions, and so avoids
the complexities of collusion and threats. Next, we show that if the Principal
selects Agents using a \emph{monotone} bandit algorithm, then for any concave
contract, in any such equilibrium, the Principal obtains no regret to
contracting with the best Agent in hindsight -- not just given their realized
actions, but also to the counterfactual world in which they had offered a
guaranteed $T$-round contract to the best Agent in hindsight, which would have
induced a different sequence of actions. Finally, we show that if the Principal
selects Agents using a monotone bandit algorithm which guarantees no
swap-regret, then the Principal can additionally offer only limited liability
contracts (in which the Agent never needs to pay the Principal) while getting
no-regret to the counterfactual world in which she offered a linear contract to
the best Agent in hindsight -- despite the fact that linear contracts are not
limited liability. We instantiate this theorem by demonstrating the existence
of a monotone no swap-regret bandit algorithm, which to our knowledge has not
previously appeared in the literature.
Related papers
- On the Resilience of Multi-Agent Systems with Malicious Agents [58.79302663733702]
This paper investigates what is the resilience of multi-agent system structures under malicious agents.
We devise two methods, AutoTransform and AutoInject, to transform any agent into a malicious one.
We show that two defense methods, introducing a mechanism for each agent to challenge others' outputs, or an additional agent to review and correct messages, can enhance system resilience.
arXiv Detail & Related papers (2024-08-02T03:25:20Z) - Incentivized Learning in Principal-Agent Bandit Games [62.41639598376539]
This work considers a repeated principal-agent bandit game, where the principal can only interact with her environment through the agent.
The principal can influence the agent's decisions by offering incentives which add up to his rewards.
We present nearly optimal learning algorithms for the principal's regret in both multi-armed and linear contextual settings.
arXiv Detail & Related papers (2024-03-06T16:00:46Z) - Contracting with a Learning Agent [32.950708673180436]
We study the study of repeated contracts with a learning agent, focusing on agents who achieve no-regret outcomes.
We achieve an optimal solution to this problem for a canonical contract setting, in which the agent's choice among multiple actions leads to success/failure.
Our results generalize beyond success/failure, to arbitrary non-linear contracts which the principal rescales dynamically.
arXiv Detail & Related papers (2024-01-29T14:53:22Z) - Principal-Agent Reward Shaping in MDPs [50.914110302917756]
Principal-agent problems arise when one party acts on behalf of another, leading to conflicts of interest.
We study a two-player Stack game where the principal and the agent have different reward functions, and the agent chooses an MDP policy for both players.
Our results establish trees and deterministic decision processes with a finite horizon.
arXiv Detail & Related papers (2023-12-30T18:30:44Z) - Robust and Performance Incentivizing Algorithms for Multi-Armed Bandits
with Strategic Agents [57.627352949446625]
We consider a variant of the multi-armed bandit problem.
Specifically, the arms are strategic agents who can improve their rewards or absorb them.
We identify a class of MAB algorithms which satisfy a collection of properties and show that they lead to mechanisms that incentivize top level performance at equilibrium.
arXiv Detail & Related papers (2023-12-13T06:54:49Z) - Byzantine-Resilient Decentralized Multi-Armed Bandits [25.499420566469098]
We develop an algorithm that fuses an information mixing step among agents with a truncation of inconsistent and extreme values.
This framework can be used to model attackers in computer networks, instigators of offensive content into recommender systems, or manipulators of financial markets.
arXiv Detail & Related papers (2023-10-11T09:09:50Z) - Learning Optimal Contracts: How to Exploit Small Action Spaces [37.92189925462977]
We study principal-agent problems in which a principal commits to an outcome-dependent payment scheme.
We design an algorithm that learns an approximately-optimal contract with high probability.
It can also be employed to provide a $tildemathcalO(T4/5)$ regret bound in the related online learning setting.
arXiv Detail & Related papers (2023-09-18T14:18:35Z) - Pure Exploration under Mediators' Feedback [63.56002444692792]
Multi-armed bandits are a sequential-decision-making framework, where, at each interaction step, the learner selects an arm and observes a reward.
We consider the scenario in which the learner has access to a set of mediators, each of which selects the arms on the agent's behalf according to a and possibly unknown policy.
We propose a sequential decision-making strategy for discovering the best arm under the assumption that the mediators' policies are known to the learner.
arXiv Detail & Related papers (2023-08-29T18:18:21Z) - Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret
Learning in Markov Games [95.10091348976779]
We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents.
We propose a new algorithm, underlineDecentralized underlineOptimistic hypeunderlineRpolicy munderlineIrror deunderlineScent (DORIS)
DORIS achieves $sqrtK$-regret in the context of general function approximation, where $K$ is the number of episodes.
arXiv Detail & Related papers (2022-06-03T14:18:05Z) - Robust Multi-Agent Multi-Armed Bandits [26.26185074977412]
Recent works have shown that agents facing independent instances of a $K$-armed bandit can collaborate to decrease regret.
We show that collaboration indeed decreases regret for this algorithm, assuming $m$ is small compared to $K$ but without assumptions on malicious agents' behavior.
arXiv Detail & Related papers (2020-07-07T22:27:30Z) - The Gossiping Insert-Eliminate Algorithm for Multi-Agent Bandits [20.259428328004738]
We consider a decentralized multi-agent Multi Armed Bandit (MAB) setup consisting of $N$ agents.
In our model, agents collaborate by exchanging messages through pairwise gossip style communications on an arbitrary connected graph.
arXiv Detail & Related papers (2020-01-15T17:49:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.