Regret Minimization in Stackelberg Games with Side Information
- URL: http://arxiv.org/abs/2402.08576v4
- Date: Sat, 12 Oct 2024 02:29:23 GMT
- Title: Regret Minimization in Stackelberg Games with Side Information
- Authors: Keegan Harris, Zhiwei Steven Wu, Maria-Florina Balcan,
- Abstract summary: We formalize settings for Stackelberg games in which both players observe an external context before playing.
In sharp contrast to the non-context version, we show that it is impossible for the leader to achieve no-regret in the full adversarial setting.
Motivated by this result, we show that no-regret learning is possible in two natural relaxations.
- Score: 44.72865997906019
- License:
- Abstract: Algorithms for playing in Stackelberg games have been deployed in real-world domains including airport security, anti-poaching efforts, and cyber-crime prevention. However, these algorithms often fail to take into consideration the additional information available to each player (e.g. traffic patterns, weather conditions, network congestion), which may significantly affect both players' optimal strategies. We formalize such settings as Stackelberg games with side information, in which both players observe an external context before playing. The leader commits to a (context-dependent) strategy, and the follower best-responds to both the leader's strategy and the context. We focus on the online setting in which a sequence of followers arrive over time, and the context may change from round-to-round. In sharp contrast to the non-contextual version, we show that it is impossible for the leader to achieve no-regret in the full adversarial setting. Motivated by this result, we show that no-regret learning is possible in two natural relaxations: the setting in which the sequence of followers is chosen stochastically and the sequence of contexts is adversarial, and the setting in which contexts are stochastic and follower types are adversarial.
Related papers
- Decentralized Online Learning in General-Sum Stackelberg Games [2.8659922790025463]
We study an online learning problem in general-sum Stackelberg games, where players act in a decentralized and strategic manner.
We show that for the follower, myopically best responding to the leader's action is the best strategy for the limited information setting.
We design a new manipulation strategy for the follower in the latter setting, and show that it has an intrinsic advantage against the best response strategy.
arXiv Detail & Related papers (2024-05-06T04:35:01Z) - When Should a Leader Act Suboptimally? The Role of Inferability in Repeated Stackelberg Games [28.856644679990357]
We model the inferability problem using Stackelberg games with observations where a leader and a follower repeatedly interact.
For a variety of game settings, we show that the inferability gap is upper-bounded by a function of the number of interactions and theity level of the leader's strategy.
We identify a set of games where the leader's near-optimal strategy may suffer from a large inferability gap.
arXiv Detail & Related papers (2023-09-30T19:08:05Z) - Opponent Modeling in Multiplayer Imperfect-Information Games [1.024113475677323]
We present an approach for opponent modeling in multiplayer imperfect-information games.
We run experiments against a variety of real opponents and exact Nash equilibrium strategies in three-player Kuhn poker.
Our algorithm significantly outperforms all of the agents, including the exact Nash equilibrium strategies.
arXiv Detail & Related papers (2022-12-12T16:48:53Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret
Learning in Markov Games [95.10091348976779]
We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents.
We propose a new algorithm, underlineDecentralized underlineOptimistic hypeunderlineRpolicy munderlineIrror deunderlineScent (DORIS)
DORIS achieves $sqrtK$-regret in the context of general function approximation, where $K$ is the number of episodes.
arXiv Detail & Related papers (2022-06-03T14:18:05Z) - No-Regret Learning in Dynamic Stackelberg Games [31.001205916012307]
In a Stackelberg game, a leader commits to a randomized strategy, and a follower chooses their best strategy in response.
We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader's rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower's selected strategies.
arXiv Detail & Related papers (2022-02-10T01:07:57Z) - Adversarial Training as Stackelberg Game: An Unrolled Optimization
Approach [91.74682538906691]
Adversarial training has been shown to improve the generalization performance of deep learning models.
We propose Stackelberg Adversarial Training (SALT), which formulates adversarial training as a Stackelberg game.
arXiv Detail & Related papers (2021-04-11T00:44:57Z) - Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.