Learning Correlated Stackelberg Equilibrium in General-Sum
Multi-Leader-Single-Follower Games
- URL: http://arxiv.org/abs/2210.12470v1
- Date: Sat, 22 Oct 2022 15:05:44 GMT
- Title: Learning Correlated Stackelberg Equilibrium in General-Sum
Multi-Leader-Single-Follower Games
- Authors: Yaolong Yu, Haifeng Xu, Haipeng Chen
- Abstract summary: We study a hierarchical multi-player game structure, where players with asymmetric roles can be separated into leaders and followers.
In particular, we focus on a Stackelberg game scenario where there are multiple leaders and a single follower.
We propose a novel asymmetric equilibrium concept for the MLSF game called Correlated Stackelberg Equilibrium (CSE)
- Score: 16.810700878778007
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world strategic games involve interactions between multiple
players. We study a hierarchical multi-player game structure, where players
with asymmetric roles can be separated into leaders and followers, a setting
often referred to as Stackelberg game or leader-follower game. In particular,
we focus on a Stackelberg game scenario where there are multiple leaders and a
single follower, called the Multi-Leader-Single-Follower (MLSF) game. We
propose a novel asymmetric equilibrium concept for the MLSF game called
Correlated Stackelberg Equilibrium (CSE). We design online learning algorithms
that enable the players to interact in a distributed manner, and prove that it
can achieve no-external Stackelberg-regret learning. This further translates to
the convergence to approximate CSE via a reduction from no-external regret to
no-swap regret. At the core of our works, we solve the intricate problem of how
to learn equilibrium in leader-follower games with noisy bandit feedback by
balancing exploration and exploitation in different learning structures.
Related papers
- ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners [1.4849645397321183]
We show when the follower strategy is either reward-average or transform-reward-average, the two players can always get the Stackelberg equilibrium.
Also, we show a strict upper bound of the follower's utility difference between with and without no regret constraint.
arXiv Detail & Related papers (2024-08-26T08:12:26Z) - Impact of Decentralized Learning on Player Utilities in Stackelberg Games [57.08270857260131]
In many two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned.
We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks result in worst-case linear regret for at least one player.
We develop algorithms to achieve near-optimal $O(T2/3)$ regret for both players with respect to these benchmarks.
arXiv Detail & Related papers (2024-02-29T23:38:28Z) - Who Plays First? Optimizing the Order of Play in Stackelberg Games with Many Robots [4.146913555716228]
Branch and Play (B&P) is an efficient and exact algorithm that converges to a socially optimal order of play and its Stackelberg equilibrium.
We demonstrate the practical utility of B&P to coordinate air traffic control, swarm formation, and delivery vehicle fleets.
arXiv Detail & Related papers (2024-02-14T15:34:38Z) - Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - Online Learning in Stackelberg Games with an Omniscient Follower [83.42564921330896]
We study the problem of online learning in a two-player decentralized cooperative Stackelberg game.
In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader's move.
We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically.
arXiv Detail & Related papers (2023-01-27T03:35:10Z) - On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - Can Reinforcement Learning Find Stackelberg-Nash Equilibria in
General-Sum Markov Games with Myopic Followers? [156.5760265539888]
We study multi-player general-sum Markov games with one of the players designated as the leader and the other players regarded as followers.
For such a game, our goal is to find a Stackelberg-Nash equilibrium (SNE), which is a policy pair $(pi*, nu*)$.
We develop sample-efficient reinforcement learning (RL) algorithms for solving for an SNE in both online and offline settings.
arXiv Detail & Related papers (2021-12-27T05:41:14Z) - Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games [78.65798135008419]
It remains vastly open how to learn the Stackelberg equilibrium in general-sum games efficiently from samples.
This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium in two-player turn-based general-sum games.
arXiv Detail & Related papers (2021-02-23T05:11:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.