Online Learning in Stackelberg Games with an Omniscient Follower
- URL: http://arxiv.org/abs/2301.11518v2
- Date: Tue, 11 Apr 2023 20:27:37 GMT
- Title: Online Learning in Stackelberg Games with an Omniscient Follower
- Authors: Geng Zhao, Banghua Zhu, Jiantao Jiao, Michael I. Jordan
- Abstract summary: We study the problem of online learning in a two-player decentralized cooperative Stackelberg game.
In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader's move.
We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically.
- Score: 83.42564921330896
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of online learning in a two-player decentralized
cooperative Stackelberg game. In each round, the leader first takes an action,
followed by the follower who takes their action after observing the leader's
move. The goal of the leader is to learn to minimize the cumulative regret
based on the history of interactions. Differing from the traditional
formulation of repeated Stackelberg games, we assume the follower is
omniscient, with full knowledge of the true reward, and that they always
best-respond to the leader's actions. We analyze the sample complexity of
regret minimization in this repeated Stackelberg game. We show that depending
on the reward structure, the existence of the omniscient follower may change
the sample complexity drastically, from constant to exponential, even for
linear cooperative Stackelberg games. This poses unique challenges for the
learning process of the leader and the subsequent regret analysis.
Related papers
- ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners [1.4849645397321183]
We show when the follower strategy is either reward-average or transform-reward-average, the two players can always get the Stackelberg equilibrium.
Also, we show a strict upper bound of the follower's utility difference between with and without no regret constraint.
arXiv Detail & Related papers (2024-08-26T08:12:26Z) - Decentralized Online Learning in General-Sum Stackelberg Games [2.8659922790025463]
We study an online learning problem in general-sum Stackelberg games, where players act in a decentralized and strategic manner.
We show that for the follower, myopically best responding to the leader's action is the best strategy for the limited information setting.
We design a new manipulation strategy for the follower in the latter setting, and show that it has an intrinsic advantage against the best response strategy.
arXiv Detail & Related papers (2024-05-06T04:35:01Z) - Impact of Decentralized Learning on Player Utilities in Stackelberg Games [57.08270857260131]
In many two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned.
We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks result in worst-case linear regret for at least one player.
We develop algorithms to achieve near-optimal $O(T2/3)$ regret for both players with respect to these benchmarks.
arXiv Detail & Related papers (2024-02-29T23:38:28Z) - Actions Speak What You Want: Provably Sample-Efficient Reinforcement
Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure.
Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z) - Learning Correlated Stackelberg Equilibrium in General-Sum
Multi-Leader-Single-Follower Games [16.810700878778007]
We study a hierarchical multi-player game structure, where players with asymmetric roles can be separated into leaders and followers.
In particular, we focus on a Stackelberg game scenario where there are multiple leaders and a single follower.
We propose a novel asymmetric equilibrium concept for the MLSF game called Correlated Stackelberg Equilibrium (CSE)
arXiv Detail & Related papers (2022-10-22T15:05:44Z) - No-Regret Learning in Dynamic Stackelberg Games [31.001205916012307]
In a Stackelberg game, a leader commits to a randomized strategy, and a follower chooses their best strategy in response.
We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader's rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower's selected strategies.
arXiv Detail & Related papers (2022-02-10T01:07:57Z) - Adversarial Training as Stackelberg Game: An Unrolled Optimization
Approach [91.74682538906691]
Adversarial training has been shown to improve the generalization performance of deep learning models.
We propose Stackelberg Adversarial Training (SALT), which formulates adversarial training as a Stackelberg game.
arXiv Detail & Related papers (2021-04-11T00:44:57Z) - Online Learning in Unknown Markov Games [55.07327246187741]
We study online learning in unknown Markov games.
We show that achieving sublinear regret against the best response in hindsight is statistically hard.
We present an algorithm that achieves a sublinear $tildemathcalO(K2/3)$ regret after $K$ episodes.
arXiv Detail & Related papers (2020-10-28T14:52:15Z) - Optimally Deceiving a Learning Leader in Stackelberg Games [123.14187606686006]
Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower.
This paper shows that it is always possible for the follower to compute (near-optimal) payoffs for various scenarios about the learning interaction between leader and follower.
arXiv Detail & Related papers (2020-06-11T16:18:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.