Learning in Bayesian Stackelberg Games With Unknown Follower's Types
- URL: http://arxiv.org/abs/2602.00771v1
- Date: Sat, 31 Jan 2026 15:24:34 GMT
- Title: Learning in Bayesian Stackelberg Games With Unknown Follower's Types
- Authors: Matteo Bollini, Francesco Bacchiocchi, Samuel Coutts, Matteo Castiglioni, Alberto Marchesi,
- Abstract summary: We study online learning in Bayesian Stackelberg games.<n>We focus on the easier type feedback model, where the follower's type is also revealed.<n>In such a setting, we propose a no-regret algorithm that achieves a regret of $widetildeO(sqrtT)$, when ignoring the dependence on other parameters.
- Score: 25.394442065108596
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We study online learning in Bayesian Stackelberg games, where a leader repeatedly interacts with a follower whose unknown private type is independently drawn at each round from an unknown probability distribution. The goal is to design algorithms that minimize the leader's regret with respect to always playing an optimal commitment computed with knowledge of the game. We consider, for the first time to the best of our knowledge, the most realistic case in which the leader does not know anything about the follower's types, i.e., the possible follower payoffs. This raises considerable additional challenges compared to the commonly studied case in which the payoffs of follower types are known. First, we prove a strong negative result: no-regret is unattainable under action feedback, i.e., when the leader only observes the follower's best response at the end of each round. Thus, we focus on the easier type feedback model, where the follower's type is also revealed. In such a setting, we propose a no-regret algorithm that achieves a regret of $\widetilde{O}(\sqrt{T})$, when ignoring the dependence on other parameters.
Related papers
- Learning in Structured Stackelberg Games [20.392732735387238]
We study structured Stackelberg games in which both players observe contextual information about the state of the world at time of play.<n>We assume a fixed relationship between the contextual information and the follower's type.<n>We show that standard learning theoretic measures of complexity do not characterize the difficulty of the leader's learning task.
arXiv Detail & Related papers (2025-04-11T23:14:32Z) - Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information [57.287431079644705]
We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers.<n>We provide learning algorithms for the leader which achieve $O(T1/2)$ regret under bandit feedback.
arXiv Detail & Related papers (2025-01-31T22:40:57Z) - Regret Minimization in Stackelberg Games with Side Information [44.72865997906019]
We formalize settings for Stackelberg games in which both players observe an external context before playing.
In sharp contrast to the non-context version, we show that it is impossible for the leader to achieve no-regret in the full adversarial setting.
Motivated by this result, we show that no-regret learning is possible in two natural relaxations.
arXiv Detail & Related papers (2024-02-13T16:24:57Z) - Online Learning in Stackelberg Games with an Omniscient Follower [83.42564921330896]
We study the problem of online learning in a two-player decentralized cooperative Stackelberg game.
In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader's move.
We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically.
arXiv Detail & Related papers (2023-01-27T03:35:10Z) - No-Regret Learning in Dynamic Stackelberg Games [31.001205916012307]
In a Stackelberg game, a leader commits to a randomized strategy, and a follower chooses their best strategy in response.
We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader's rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower's selected strategies.
arXiv Detail & Related papers (2022-02-10T01:07:57Z) - Can Reinforcement Learning Find Stackelberg-Nash Equilibria in
General-Sum Markov Games with Myopic Followers? [156.5760265539888]
We study multi-player general-sum Markov games with one of the players designated as the leader and the other players regarded as followers.
For such a game, our goal is to find a Stackelberg-Nash equilibrium (SNE), which is a policy pair $(pi*, nu*)$.
We develop sample-efficient reinforcement learning (RL) algorithms for solving for an SNE in both online and offline settings.
arXiv Detail & Related papers (2021-12-27T05:41:14Z) - Adversarial Training as Stackelberg Game: An Unrolled Optimization
Approach [91.74682538906691]
Adversarial training has been shown to improve the generalization performance of deep learning models.
We propose Stackelberg Adversarial Training (SALT), which formulates adversarial training as a Stackelberg game.
arXiv Detail & Related papers (2021-04-11T00:44:57Z) - Online Learning in Unknown Markov Games [55.07327246187741]
We study online learning in unknown Markov games.
We show that achieving sublinear regret against the best response in hindsight is statistically hard.
We present an algorithm that achieves a sublinear $tildemathcalO(K2/3)$ regret after $K$ episodes.
arXiv Detail & Related papers (2020-10-28T14:52:15Z) - Optimally Deceiving a Learning Leader in Stackelberg Games [123.14187606686006]
Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower.
This paper shows that it is always possible for the follower to compute (near-optimal) payoffs for various scenarios about the learning interaction between leader and follower.
arXiv Detail & Related papers (2020-06-11T16:18:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.