Learning to Manipulate a Commitment Optimizer
- URL: http://arxiv.org/abs/2302.11829v2
- Date: Sun, 26 Feb 2023 16:23:09 GMT
- Title: Learning to Manipulate a Commitment Optimizer
- Authors: Yurong Chen, Xiaotie Deng, Jiarui Gan, Yuhao Li
- Abstract summary: Recent studies show that in a Stackelberg game the follower can manipulate the leader by deviating from their true best-response behavior.
The risk these findings indicate appears to be alleviated to some extent by a strict information advantage the manipulations rely on.
We consider the scenario where the follower is not given any information about the leader's payoffs to begin with but has to learn to manipulate by interacting with the leader.
- Score: 14.806314018261416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It is shown in recent studies that in a Stackelberg game the follower can
manipulate the leader by deviating from their true best-response behavior. Such
manipulations are computationally tractable and can be highly beneficial for
the follower. Meanwhile, they may result in significant payoff losses for the
leader, sometimes completely defeating their first-mover advantage. A warning
to commitment optimizers, the risk these findings indicate appears to be
alleviated to some extent by a strict information advantage the manipulations
rely on. That is, the follower knows the full information about both players'
payoffs whereas the leader only knows their own payoffs. In this paper, we
study the manipulation problem with this information advantage relaxed. We
consider the scenario where the follower is not given any information about the
leader's payoffs to begin with but has to learn to manipulate by interacting
with the leader. The follower can gather necessary information by querying the
leader's optimal commitments against contrived best-response behaviors. Our
results indicate that the information advantage is not entirely indispensable
to the follower's manipulations: the follower can learn the optimal way to
manipulate in polynomial time with polynomially many queries of the leader's
optimal commitment.
Related papers
- Decentralized Online Learning in General-Sum Stackelberg Games [2.8659922790025463]
We study an online learning problem in general-sum Stackelberg games, where players act in a decentralized and strategic manner.
We show that for the follower, myopically best responding to the leader's action is the best strategy for the limited information setting.
We design a new manipulation strategy for the follower in the latter setting, and show that it has an intrinsic advantage against the best response strategy.
arXiv Detail & Related papers (2024-05-06T04:35:01Z) - Actions Speak What You Want: Provably Sample-Efficient Reinforcement
Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure.
Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z) - Online Learning in Stackelberg Games with an Omniscient Follower [83.42564921330896]
We study the problem of online learning in a two-player decentralized cooperative Stackelberg game.
In each round, the leader first takes an action, followed by the follower who takes their action after observing the leader's move.
We show that depending on the reward structure, the existence of the omniscient follower may change the sample complexity drastically.
arXiv Detail & Related papers (2023-01-27T03:35:10Z) - Commitment with Signaling under Double-sided Information Asymmetry [19.349072233281852]
This work considers a double-sided information asymmetry in a Bayesian Stackelberg game.
We show that by adequately designing a signaling device that reveals partial information regarding the leader's realized action to the follower, the leader can achieve a higher expected utility than that without signaling.
arXiv Detail & Related papers (2022-12-22T01:30:54Z) - Optimal Private Payoff Manipulation against Commitment in Extensive-form
Games [7.739432465414604]
We study the follower's optimal manipulation via such strategic behaviors in extensive-form games.
We show that it is tractable for the follower to find the optimal way of misreporting his private payoff.
arXiv Detail & Related papers (2022-06-27T08:50:28Z) - Offline Reinforcement Learning as Anti-Exploration [49.72457136766916]
We take inspiration from the literature on bonus-based exploration to design a new offline RL agent.
The core idea is to subtract a prediction-based exploration bonus from the reward, instead of adding it for exploration.
We show that our agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks.
arXiv Detail & Related papers (2021-06-11T14:41:30Z) - PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via
Relabeling Experience and Unsupervised Pre-training [94.87393610927812]
We present an off-policy, interactive reinforcement learning algorithm that capitalizes on the strengths of both feedback and off-policy learning.
We demonstrate that our approach is capable of learning tasks of higher complexity than previously considered by human-in-the-loop methods.
arXiv Detail & Related papers (2021-06-09T14:10:50Z) - Adversarial Training as Stackelberg Game: An Unrolled Optimization
Approach [91.74682538906691]
Adversarial training has been shown to improve the generalization performance of deep learning models.
We propose Stackelberg Adversarial Training (SALT), which formulates adversarial training as a Stackelberg game.
arXiv Detail & Related papers (2021-04-11T00:44:57Z) - Optimally Deceiving a Learning Leader in Stackelberg Games [123.14187606686006]
Recent results in the ML community have revealed that learning algorithms used to compute the optimal strategy for the leader to commit to in a Stackelberg game, are susceptible to manipulation by the follower.
This paper shows that it is always possible for the follower to compute (near-optimal) payoffs for various scenarios about the learning interaction between leader and follower.
arXiv Detail & Related papers (2020-06-11T16:18:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.