Related papers: Online Learning in Stackelberg Games with an Omniscient Follower

Online Learning in Stackelberg Games with an Omniscient Follower

URL: http://arxiv.org/abs/2301.11518v2
Date: Tue, 11 Apr 2023 20:27:37 GMT
Title: Online Learning in Stackelberg Games with an Omniscient Follower
Authors: Geng Zhao, Banghua Zhu, Jiantao Jiao, Michael I. Jordan
Abstract summary: We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader's move. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically.
Score: 83.42564921330896
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We study the problem of online learning in a two-player decentralized cooperative Stackelberg game. In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader's move. The goal of the leader is to learn to minimize the cumulative regret based on the history of interactions. Differing from the traditional formulation of repeated Stackelberg games, we assume the follower is omniscient, with full knowledge of the true reward, and that they always best-respond to the leader's actions. We analyze the sample complexity of regret minimization in this repeated Stackelberg game. We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically, from constant to exponential, even for linear cooperative Stackelberg games. This poses unique challenges for the learning process of the leader and the subsequent regret analysis.

Related papers

Learning in Structured Stackelberg Games [20.392732735387238]
We study structured Stackelberg games in which both players observe contextual information about the state of the world at time of play.<n>We assume a fixed relationship between the contextual information and the follower's type.<n>We show that standard learning theoretic measures of complexity do not characterize the difficulty of the leader's learning task.
arXiv Detail & Related papers (2025-04-11T23:14:32Z)
Neural Operators Can Play Dynamic Stackelberg Games [9.058593115274336]
Dynamic Stackelberg games are a broad class of two-player games in which the leader acts first, and the follower chooses a response strategy to the leader's strategy. This paper addresses the issue by showing that the textitfollower's best-response operator can be approximately implemented by an textitattention-based neural operator We show that the value of the Stackelberg game where the follower uses the approximate best-response operator approximates the value of the original Stackelberg game.
arXiv Detail & Related papers (2024-11-14T18:12:06Z)
ReLExS: Reinforcement Learning Explanations for Stackelberg No-Regret Learners [1.4849645397321183]
We show when the follower strategy is either reward-average or transform-reward-average, the two players can always get the Stackelberg equilibrium. Also, we show a strict upper bound of the follower's utility difference between with and without no regret constraint.
arXiv Detail & Related papers (2024-08-26T08:12:26Z)
Decentralized Online Learning in General-Sum Stackelberg Games [2.8659922790025463]
We study an online learning problem in general-sum Stackelberg games, where players act in a decentralized and strategic manner. We show that for the follower, myopically best responding to the leader's action is the best strategy for the limited information setting. We design a new manipulation strategy for the follower in the latter setting, and show that it has an intrinsic advantage against the best response strategy.
arXiv Detail & Related papers (2024-05-06T04:35:01Z)
Impact of Decentralized Learning on Player Utilities in Stackelberg Games [57.08270857260131]
In many two-agent systems, each agent learns separately and the rewards of the two agents are not perfectly aligned. We model these systems as Stackelberg games with decentralized learning and show that standard regret benchmarks result in worst-case linear regret for at least one player. We develop algorithms to achieve near-optimal $O(T2/3)$ regret for both players with respect to these benchmarks.
arXiv Detail & Related papers (2024-02-29T23:38:28Z)
Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure. Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z)
Learning Correlated Stackelberg Equilibrium in General-Sum Multi-Leader-Single-Follower Games [16.810700878778007]
We study a hierarchical multi-player game structure, where players with asymmetric roles can be separated into leaders and followers. In particular, we focus on a Stackelberg game scenario where there are multiple leaders and a single follower. We propose a novel asymmetric equilibrium concept for the MLSF game called Correlated Stackelberg Equilibrium (CSE)
arXiv Detail & Related papers (2022-10-22T15:05:44Z)
Adversarial Training as Stackelberg Game: An Unrolled Optimization Approach [91.74682538906691]
Adversarial training has been shown to improve the generalization performance of deep learning models. We propose Stackelberg Adversarial Training (SALT), which formulates adversarial training as a Stackelberg game.
arXiv Detail & Related papers (2021-04-11T00:44:57Z)
Online Learning in Unknown Markov Games [55.07327246187741]
We study online learning in unknown Markov games. We show that achieving sublinear regret against the best response in hindsight is statistically hard. We present an algorithm that achieves a sublinear $tildemathcalO(K2/3)$ regret after $K$ episodes.
arXiv Detail & Related papers (2020-10-28T14:52:15Z)
Optimally Deceiving a Learning Leader in Stackelberg Games [123.14187606686006]
Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower. This paper shows that it is always possible for the follower to compute (near-optimal) payoffs for various scenarios about the learning interaction between leader and follower.
arXiv Detail & Related papers (2020-06-11T16:18:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.