Learning in Stackelberg Games with Non-myopic Agents
- URL: http://arxiv.org/abs/2208.09407v1
- Date: Fri, 19 Aug 2022 15:49:30 GMT
- Title: Learning in Stackelberg Games with Non-myopic Agents
- Authors: Nika Haghtalab, Thodoris Lykouris, Sloan Nietert, Alex Wei
- Abstract summary: We study Stackelberg games where a principal repeatedly interacts with a long-lived, non-myopic agent.
Although learning in Stackelberg games is well-understood when the agent is myopic, non-myopic agents pose additional complications.
We provide a general framework that reduces learning in presence of non-myopic agents to robust bandit optimization in the presence of myopic agents.
- Score: 14.727571071020446
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study Stackelberg games where a principal repeatedly interacts with a
long-lived, non-myopic agent, without knowing the agent's payoff function.
Although learning in Stackelberg games is well-understood when the agent is
myopic, non-myopic agents pose additional complications. In particular,
non-myopic agents may strategically select actions that are inferior in the
present to mislead the principal's learning algorithm and obtain better
outcomes in the future.
We provide a general framework that reduces learning in presence of
non-myopic agents to robust bandit optimization in the presence of myopic
agents. Through the design and analysis of minimally reactive bandit
algorithms, our reduction trades off the statistical efficiency of the
principal's learning algorithm against its effectiveness in inducing
near-best-responses. We apply this framework to Stackelberg security games
(SSGs), pricing with unknown demand curve, strategic classification, and
general finite Stackelberg games. In each setting, we characterize the type and
impact of misspecifications present in near-best-responses and develop a
learning algorithm robust to such misspecifications.
Along the way, we improve the query complexity of learning in SSGs with $n$
targets from the state-of-the-art $O(n^3)$ to a near-optimal $\widetilde{O}(n)$
by uncovering a fundamental structural property of such games. This result is
of independent interest beyond learning with non-myopic agents.
Related papers
- Principal-Agent Bandit Games with Self-Interested and Exploratory Learning Agents [16.514561132180134]
We study the repeated principal-agent bandit game, where the principal indirectly interacts with the unknown environment by proposing incentives for the agent to play arms.
Most existing work assumes the agent has full knowledge of the reward means and always behaves greedily, but in many online marketplaces, the agent needs to learn the unknown environment and sometimes explore.
Motivated by such settings, we model a self-interested learning agent with exploration behaviors who iteratively updates reward estimates and either selects an arm that maximizes the estimated reward plus incentive or explores arbitrarily with a certain probability.
arXiv Detail & Related papers (2024-12-20T20:04:50Z) - Strategic Classification With Externalities [11.36782598786846]
We propose a new variant of the strategic classification problem.
Motivated by real-world applications, our model crucially allows the manipulation of one agent to affect another.
We show that under certain assumptions, the pure Nash Equilibrium of this agent manipulation game is unique and can be efficiently computed.
arXiv Detail & Related papers (2024-10-10T15:28:04Z) - Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent.
We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.
Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z) - Impact of Decentralized Learning on Player Utilities in Stackelberg Games [57.08270857260131]
In many two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned.
We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks result in worst-case linear regret for at least one player.
We develop algorithms to achieve near-optimal $O(T2/3)$ regret for both players with respect to these benchmarks.
arXiv Detail & Related papers (2024-02-29T23:38:28Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Learning to Incentivize Information Acquisition: Proper Scoring Rules
Meet Principal-Agent Model [64.94131130042275]
We study the incentivized information acquisition problem, where a principal hires an agent to gather information on her behalf.
We design a provably sample efficient algorithm that tailors the UCB algorithm to our model.
Our algorithm features a delicate estimation procedure for the optimal profit of the principal, and a conservative correction scheme that ensures the desired agent's actions are incentivized.
arXiv Detail & Related papers (2023-03-15T13:40:16Z) - Disturbing Reinforcement Learning Agents with Corrupted Rewards [62.997667081978825]
We analyze the effects of different attack strategies based on reward perturbations on reinforcement learning algorithms.
We show that smoothly crafting adversarial rewards are able to mislead the learner, and that using low exploration probability values, the policy learned is more robust to corrupt rewards.
arXiv Detail & Related papers (2021-02-12T15:53:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.