No-Regret Learning in Dynamic Stackelberg Games
- URL: http://arxiv.org/abs/2202.04786v1
- Date: Thu, 10 Feb 2022 01:07:57 GMT
- Title: No-Regret Learning in Dynamic Stackelberg Games
- Authors: Niklas Lauffer, Mahsa Ghasemi, Abolfazl Hashemi, Yagiz Savas, and Ufuk
Topcu
- Abstract summary: In a Stackelberg game, a leader commits to a randomized strategy, and a follower chooses their best strategy in response.
We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader's rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower's selected strategies.
- Score: 31.001205916012307
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a Stackelberg game, a leader commits to a randomized strategy, and a
follower chooses their best strategy in response. We consider an extension of a
standard Stackelberg game, called a discrete-time dynamic Stackelberg game,
that has an underlying state space that affects the leader's rewards and
available strategies and evolves in a Markovian manner depending on both the
leader and follower's selected strategies. Although standard Stackelberg games
have been utilized to improve scheduling in security domains, their deployment
is often limited by requiring complete information of the follower's utility
function. In contrast, we consider scenarios where the follower's utility
function is unknown to the leader; however, it can be linearly parameterized.
Our objective then is to provide an algorithm that prescribes a randomized
strategy to the leader at each step of the game based on observations of how
the follower responded in previous steps. We design a no-regret learning
algorithm that, with high probability, achieves a regret bound (when compared
to the best policy in hindsight) which is sublinear in the number of time
steps; the degree of sublinearity depends on the number of features
representing the follower's utility function. The regret of the proposed
learning algorithm is independent of the size of the state space and polynomial
in the rest of the parameters of the game. We show that the proposed learning
algorithm outperforms existing model-free reinforcement learning approaches.
Related papers
- Neural Operators Can Play Dynamic Stackelberg Games [9.058593115274336]
Dynamic Stackelberg games are a broad class of two-player games in which the leader acts first, and the follower chooses a response strategy to the leader's strategy.
This paper addresses the issue by showing that the textitfollower's best-response operator can be approximately implemented by an textitattention-based neural operator
We show that the value of the Stackelberg game where the follower uses the approximate best-response operator approximates the value of the original Stackelberg game.
arXiv Detail & Related papers (2024-11-14T18:12:06Z) - Actions Speak What You Want: Provably Sample-Efficient Reinforcement
Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure.
Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z) - Follower Agnostic Methods for Stackelberg Games [14.143502615941648]
We present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner.
Our approach works even when leader has no knowledge about the followers' utility functions or strategy space.
arXiv Detail & Related papers (2023-02-02T21:21:14Z) - Learning in Stackelberg Games with Non-myopic Agents [60.927889817803745]
We study Stackelberg games where a principal repeatedly interacts with a non-myopic long-lived agent, without knowing the agent's payoff function.
We provide a general framework that reduces learning in presence of non-myopic agents to robust bandit optimization in the presence of myopic agents.
arXiv Detail & Related papers (2022-08-19T15:49:30Z) - Can Reinforcement Learning Find Stackelberg-Nash Equilibria in
General-Sum Markov Games with Myopic Followers? [156.5760265539888]
We study multi-player general-sum Markov games with one of the players designated as the leader and the other players regarded as followers.
For such a game, our goal is to find a Stackelberg-Nash equilibrium (SNE), which is a policy pair $(pi*, nu*)$.
We develop sample-efficient reinforcement learning (RL) algorithms for solving for an SNE in both online and offline settings.
arXiv Detail & Related papers (2021-12-27T05:41:14Z) - Who Leads and Who Follows in Strategic Classification? [82.44386576129295]
We argue that the order of play in strategic classification is fundamentally determined by the relative frequencies at which the decision-maker and the agents adapt to each other's actions.
We show that a decision-maker with the freedom to choose their update frequency can induce learning dynamics that converge to Stackelberg equilibria with either order of play.
arXiv Detail & Related papers (2021-06-23T16:48:46Z) - Adversarial Training as Stackelberg Game: An Unrolled Optimization
Approach [91.74682538906691]
Adversarial training has been shown to improve the generalization performance of deep learning models.
We propose Stackelberg Adversarial Training (SALT), which formulates adversarial training as a Stackelberg game.
arXiv Detail & Related papers (2021-04-11T00:44:57Z) - Optimally Deceiving a Learning Leader in Stackelberg Games [123.14187606686006]
Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower.
This paper shows that it is always possible for the follower to compute (near-optimal) payoffs for various scenarios about the learning interaction between leader and follower.
arXiv Detail & Related papers (2020-06-11T16:18:21Z) - Model-free Reinforcement Learning for Stochastic Stackelberg Security
Games [7.470839530834359]
We consider a sequential Stackelberg game with two players, a leader and a follower.
The follower has access to the state of the system while the leader does not.
We propose an RL algorithm based on Expected Sarsa that learns the Stackelberg equilibrium policy by simulating a model of the MDP.
arXiv Detail & Related papers (2020-05-24T22:34:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.