Related papers: Optimally Deceiving a Learning Leader in Stackelberg Games

Optimally Deceiving a Learning Leader in Stackelberg Games

URL: http://arxiv.org/abs/2006.06566v1
Date: Thu, 11 Jun 2020 16:18:21 GMT
Title: Optimally Deceiving a Learning Leader in Stackelberg Games
Authors: Georgios Birmpas, Jiarui Gan, Alexandros Hollender, Francisco J. Marmolejo-Coss\'io, Ninad Rajgopal, Alexandros A. Voudouris
Abstract summary: Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower. This paper shows that it is always possible for the follower to compute (near-optimal) payoffs for various scenarios about the learning interaction between leader and follower.
Score: 123.14187606686006
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower. Such a learning algorithm operates by querying the best responses or the payoffs of the follower, who consequently can deceive the algorithm by responding as if his payoffs were much different than what they actually are. For this strategic behavior to be successful, the main challenge faced by the follower is to pinpoint the payoffs that would make the learning algorithm compute a commitment so that best responding to it maximizes the follower's utility, according to his true payoffs. While this problem has been considered before, the related literature only focused on the simplified scenario in which the payoff space is finite, thus leaving the general version of the problem unanswered. In this paper, we fill in this gap, by showing that it is always possible for the follower to compute (near-)optimal payoffs for various scenarios about the learning interaction between leader and follower.

Related papers

Learning to Steer Learners in Games [13.843722297585158]
We consider the problem of learning to exploit learning algorithms through repeated interactions in games. We first show that this is impossible if the only knows that the learner is using an algorithm from the general of no-regret algorithms. We demonstrate the effectiveness of this approach if the learner's algorithm is drawn from a smaller class by analyzing a mirror ascent with known regularizer and step sizes.
arXiv Detail & Related papers (2025-02-28T06:43:15Z)
Nearly-Optimal Bandit Learning in Stackelberg Games with Side Information [57.287431079644705]
We study the problem of online learning in Stackelberg games with side information between a leader and a sequence of followers. We provide learning algorithms for the leader which achieve $O(T1/2)$ regret under bandit feedback.
arXiv Detail & Related papers (2025-01-31T22:40:57Z)
Stackelberg Batch Policy Learning [3.5426153040167754]
Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration. Worst-case optimality algorithms, which calibrate a value-function model class from logged experience, have emerged as a promising paradigm for batch RL. We propose a novel gradient-based learning algorithm: StackelbergLearner, in which the leader player updates according to the total derivative of its objective instead of the usual individual gradient.
arXiv Detail & Related papers (2023-09-28T06:18:34Z)
Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure. Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z)
Contextual Bandits and Imitation Learning via Preference-Based Active Queries [17.73844193143454]
We consider the problem of contextual bandits and imitation learning, where the learner lacks direct knowledge of the executed action's reward. Instead, the learner can actively query an expert at each round to compare two actions and receive noisy preference feedback. The learner's objective is two-fold: to minimize the regret associated with the executed actions, while simultaneously, minimizing the number of comparison queries made to the expert.
arXiv Detail & Related papers (2023-07-24T16:36:04Z)
Follower Agnostic Methods for Stackelberg Games [14.143502615941648]
We present an efficient algorithm to solve online Stackelberg games, featuring multiple followers, in a follower-agnostic manner. Our approach works even when leader has no knowledge about the followers' utility functions or strategy space.
arXiv Detail & Related papers (2023-02-02T21:21:14Z)
No-Regret Learning in Dynamic Stackelberg Games [31.001205916012307]
In a Stackelberg game, a leader commits to a randomized strategy, and a follower chooses their best strategy in response. We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader's rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower's selected strategies.
arXiv Detail & Related papers (2022-02-10T01:07:57Z)
Adversarial Training as Stackelberg Game: An Unrolled Optimization Approach [91.74682538906691]
Adversarial training has been shown to improve the generalization performance of deep learning models. We propose Stackelberg Adversarial Training (SALT), which formulates adversarial training as a Stackelberg game.
arXiv Detail & Related papers (2021-04-11T00:44:57Z)
Online Apprenticeship Learning [58.45089581278177]
In Apprenticeship Learning (AL), we are given a Markov Decision Process (MDP) without access to the cost function. The goal is to find a policy that matches the expert's performance on some predefined set of cost functions. We show that the OAL problem can be effectively solved by combining two mirror descent based no-regret algorithms.
arXiv Detail & Related papers (2021-02-13T12:57:51Z)
Online Markov Decision Processes with Aggregate Bandit Feedback [74.85532145498742]
We study a novel variant of online finite-horizon Markov Decision Processes with adversarially changing loss functions and initially unknown dynamics. In each episode, the learner suffers the loss accumulated along the trajectory realized by the policy chosen for the episode, and observes aggregate bandit feedback. Our main result is a computationally efficient algorithm with $O(sqrtK)$ regret for this setting, where $K$ is the number of episodes.
arXiv Detail & Related papers (2021-01-31T16:49:07Z)
Online Linear Optimization with Many Hints [72.4277628722419]
We study an online linear optimization (OLO) problem in which the learner is provided access to $K$ "hint" vectors in each round prior to making a decision. In this setting, we devise an algorithm that obtains logarithmic regret whenever there exists a convex combination of the $K$ hints that has positive correlation with the cost vectors.
arXiv Detail & Related papers (2020-10-06T23:25:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.