Optimally Deceiving a Learning Leader in Stackelberg Games
- URL: http://arxiv.org/abs/2006.06566v1
- Date: Thu, 11 Jun 2020 16:18:21 GMT
- Title: Optimally Deceiving a Learning Leader in Stackelberg Games
- Authors: Georgios Birmpas, Jiarui Gan, Alexandros Hollender, Francisco J.
Marmolejo-Coss\'io, Ninad Rajgopal, Alexandros A. Voudouris
- Abstract summary: Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower.
This paper shows that it is always possible for the follower to compute (near-optimal) payoffs for various scenarios about the learning interaction between leader and follower.
- Score: 123.14187606686006
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent results in the ML community have revealed that learning algorithms
used to compute the optimal strategy for the leader to commit to in a
Stackelberg game, are susceptible to manipulation by the follower. Such a
learning algorithm operates by querying the best responses or the payoffs of
the follower, who consequently can deceive the algorithm by responding as if
his payoffs were much different than what they actually are. For this strategic
behavior to be successful, the main challenge faced by the follower is to
pinpoint the payoffs that would make the learning algorithm compute a
commitment so that best responding to it maximizes the follower's utility,
according to his true payoffs. While this problem has been considered before,
the related literature only focused on the simplified scenario in which the
payoff space is finite, thus leaving the general version of the problem
unanswered. In this paper, we fill in this gap, by showing that it is always
possible for the follower to compute (near-)optimal payoffs for various
scenarios about the learning interaction between leader and follower.
Related papers
- Stackelberg Batch Policy Learning [3.5426153040167754]
Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration.
Worst-case optimality algorithms, which calibrate a value-function model class from logged experience, have emerged as a promising paradigm for batch RL.
We propose a novel gradient-based learning algorithm: StackelbergLearner, in which the leader player updates according to the total derivative of its objective instead of the usual individual gradient.
arXiv Detail & Related papers (2023-09-28T06:18:34Z) - Actions Speak What You Want: Provably Sample-Efficient Reinforcement
Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure.
Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z) - Contextual Bandits and Imitation Learning via Preference-Based Active
Queries [17.73844193143454]
We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward.
Instead, the learner can actively query an expert at each round to compare two actions and receive noisy preference feedback.
The learner's objective is two-fold: to minimize the regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert.
arXiv Detail & Related papers (2023-07-24T16:36:04Z) - Follower Agnostic Methods for Stackelberg Games [14.143502615941648]
We present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner.
Our approach works even when leader has no knowledge about the followers' utility functions or strategy space.
arXiv Detail & Related papers (2023-02-02T21:21:14Z) - No-Regret Learning in Dynamic Stackelberg Games [31.001205916012307]
In a Stackelberg game, a leader commits to a randomized strategy, and a follower chooses their best strategy in response.
We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader's rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower's selected strategies.
arXiv Detail & Related papers (2022-02-10T01:07:57Z) - Adversarial Training as Stackelberg Game: An Unrolled Optimization
Approach [91.74682538906691]
Adversarial training has been shown to improve the generalization performance of deep learning models.
We propose Stackelberg Adversarial Training (SALT), which formulates adversarial training as a Stackelberg game.
arXiv Detail & Related papers (2021-04-11T00:44:57Z) - Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function.
The goal is to find a policy that matches the expert's performance on some predefined set of cost functions.
We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z) - Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics.
In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback.
Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z) - Online Linear Optimization with Many Hints [72.4277628722419]
We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision.
In this setting, we devise an algorithm that obtains logarithmic regret whenever there exists a convex combination of the $K$ hints that has positive correlation with the cost vectors.
arXiv Detail & Related papers (2020-10-06T23:25:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.