Learning in Stackelberg Games with Non-myopic Agents
- URL: http://arxiv.org/abs/2208.09407v1
- Date: Fri, 19 Aug 2022 15:49:30 GMT
- Title: Learning in Stackelberg Games with Non-myopic Agents
- Authors: Nika Haghtalab, Thodoris Lykouris, Sloan Nietert, Alex Wei
- Abstract summary: We study Stackelberg games where a principal repeatedly interacts with a long-lived, non-myopic agent.
Although learning in Stackelberg games is well-understood when the agent is myopic, non-myopic agents pose additional complications.
We provide a general framework that reduces learning in presence of non-myopic agents to robust bandit optimization in the presence of myopic agents.
- Score: 14.727571071020446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study Stackelberg games where a principal repeatedly interacts with a
long-lived, non-myopic agent, without knowing the agent's payoff function.
Although learning in Stackelberg games is well-understood when the agent is
myopic, non-myopic agents pose additional complications. In particular,
non-myopic agents may strategically select actions that are inferior in the
present to mislead the principal's learning algorithm and obtain better
outcomes in the future.
We provide a general framework that reduces learning in presence of
non-myopic agents to robust bandit optimization in the presence of myopic
agents. Through the design and analysis of minimally reactive bandit
algorithms, our reduction trades off the statistical efficiency of the
principal's learning algorithm against its effectiveness in inducing
near-best-responses. We apply this framework to Stackelberg security games
(SSGs), pricing with unknown demand curve, strategic classification, and
general finite Stackelberg games. In each setting, we characterize the type and
impact of misspecifications present in near-best-responses and develop a
learning algorithm robust to such misspecifications.
Along the way, we improve the query complexity of learning in SSGs with $n$
targets from the state-of-the-art $O(n^3)$ to a near-optimal $\widetilde{O}(n)$
by uncovering a fundamental structural property of such games. This result is
of independent interest beyond learning with non-myopic agents.
Related papers
- Strategic Classification With Externalities [11.36782598786846]
We propose a new variant of the strategic classification problem.
Motivated by real-world applications, our model crucially allows the manipulation of one agent to affect another.
We show that under certain assumptions, the pure Nash Equilibrium of this agent manipulation game is unique and can be efficiently computed.
arXiv Detail & Related papers (2024-10-10T15:28:04Z) - Paths to Equilibrium in Games [6.812247730094933]
We study sequences of strategies satisfying a pairwise constraint inspired by policy updating in reinforcement learning.
Our analysis reveals a counterintuitive insight that reward deteriorating strategic updates are key to driving play to equilibrium along a satisficing path.
arXiv Detail & Related papers (2024-03-26T19:58:39Z) - Impact of Decentralized Learning on Player Utilities in Stackelberg Games [57.08270857260131]
In many two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned.
We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks result in worst-case linear regret for at least one player.
We develop algorithms to achieve near-optimal $O(T2/3)$ regret for both players with respect to these benchmarks.
arXiv Detail & Related papers (2024-02-29T23:38:28Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z) - Optimally Deceiving a Learning Leader in Stackelberg Games [123.14187606686006]
Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower.
This paper shows that it is always possible for the follower to compute (near-optimal) payoffs for various scenarios about the learning interaction between leader and follower.
arXiv Detail & Related papers (2020-06-11T16:18:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.