Optimally Deceiving a Learning Leader in Stackelberg Games
- URL: http://arxiv.org/abs/2006.06566v1
- Date: Thu, 11 Jun 2020 16:18:21 GMT
- Title: Optimally Deceiving a Learning Leader in Stackelberg Games
- Authors: Georgios Birmpas, Jiarui Gan, Alexandros Hollender, Francisco J.
Marmolejo-Coss\'io, Ninad Rajgopal, Alexandros A. Voudouris
- Abstract summary: Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower.
This paper shows that it is always possible for the follower to compute (near-optimal) payoffs for various scenarios about the learning interaction between leader and follower.
- Score: 123.14187606686006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent results in the ML community have revealed that learning algorithms
used to compute the optimal strategy for the leader to commit to in a
Stackelberg game, are susceptible to manipulation by the follower. Such a
learning algorithm operates by querying the best responses or the payoffs of
the follower, who consequently can deceive the algorithm by responding as if
his payoffs were much different than what they actually are. For this strategic
behavior to be successful, the main challenge faced by the follower is to
pinpoint the payoffs that would make the learning algorithm compute a
commitment so that best responding to it maximizes the follower's utility,
according to his true payoffs. While this problem has been considered before,
the related literature only focused on the simplified scenario in which the
payoff space is finite, thus leaving the general version of the problem
unanswered. In this paper, we fill in this gap, by showing that it is always
possible for the follower to compute (near-)optimal payoffs for various
scenarios about the learning interaction between leader and follower.
Related papers
- Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information [57.287431079644705]
We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers.
We provide learning algorithms for the leader which achieve $O(T1/2)$ regret under bandit feedback.
arXiv Detail & Related papers (2025-01-31T22:40:57Z) - Learning to Play Against Unknown Opponents [9.346742321348366]
We show how to efficiently construct an optimal learning algorithm when the learning agent is not constrained to no-regret.
All these results make use of recently developed machinery that converts the analysis of learning algorithms to the class of corresponding geometric objects known as menus.
arXiv Detail & Related papers (2024-12-24T09:05:06Z) - Actions Speak What You Want: Provably Sample-Efficient Reinforcement
Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure.
Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z) - Follower Agnostic Methods for Stackelberg Games [14.143502615941648]
We present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner.
Our approach works even when leader has no knowledge about the followers' utility functions or strategy space.
arXiv Detail & Related papers (2023-02-02T21:21:14Z) - No-Regret Learning in Dynamic Stackelberg Games [31.001205916012307]
In a Stackelberg game, a leader commits to a randomized strategy, and a follower chooses their best strategy in response.
We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader's rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower's selected strategies.
arXiv Detail & Related papers (2022-02-10T01:07:57Z) - Adversarial Training as Stackelberg Game: An Unrolled Optimization
Approach [91.74682538906691]
Adversarial training has been shown to improve the generalization performance of deep learning models.
We propose Stackelberg Adversarial Training (SALT), which formulates adversarial training as a Stackelberg game.
arXiv Detail & Related papers (2021-04-11T00:44:57Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z) - Online Linear Optimization with Many Hints [72.4277628722419]
We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision.
In this setting, we devise an algorithm that obtains logarithmic regret whenever there exists a convex combination of the $K$ hints that has positive correlation with the cost vectors.
arXiv Detail & Related papers (2020-10-06T23:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.