Model-free Reinforcement Learning for Stochastic Stackelberg Security
Games
- URL: http://arxiv.org/abs/2005.11853v1
- Date: Sun, 24 May 2020 22:34:20 GMT
- Title: Model-free Reinforcement Learning for Stochastic Stackelberg Security
Games
- Authors: Rajesh K Mishra, Deepanshu Vasal, and Sriram Vishwanath
- Abstract summary: We consider a sequential Stackelberg game with two players, a leader and a follower.
The follower has access to the state of the system while the leader does not.
We propose an RL algorithm based on Expected Sarsa that learns the Stackelberg equilibrium policy by simulating a model of the MDP.
- Score: 7.470839530834359
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we consider a sequential stochastic Stackelberg game with two
players, a leader and a follower. The follower has access to the state of the
system while the leader does not. Assuming that the players act in their
respective best interests, the follower's strategy is to play the best response
to the leader's strategy. In such a scenario, the leader has the advantage of
committing to a policy which maximizes its own returns given the knowledge that
the follower is going to play the best response to its policy. Thus, both
players converge to a pair of policies that form the Stackelberg equilibrium of
the game. Recently,~[1] provided a sequential decomposition algorithm to
compute the Stackelberg equilibrium for such games which allow for the
computation of Markovian equilibrium policies in linear time as opposed to
double exponential, as before. In this paper, we extend the idea to an MDP
whose dynamics are not known to the players, to propose an RL algorithm based
on Expected Sarsa that learns the Stackelberg equilibrium policy by simulating
a model of the MDP. We use particle filters to estimate the belief update for a
common agent which computes the optimal policy based on the information which
is common to both the players. We present a security game example to illustrate
the policy learned by our algorithm. by simulating a model of the MDP. We use
particle filters to estimate the belief update for a common agent which
computes the optimal policy based on the information which is common to both
the players. We present a security game example to illustrate the policy
learned by our algorithm.
Related papers
- Blending Data-Driven Priors in Dynamic Games [9.085463548798366]
We formulate an algorithm for solving non-cooperative dynamic game with Kullback-Leibler (KL) regularization.
We propose an efficient algorithm for computing multi-modal approximate feedback Nash equilibrium strategies of KLGame in real time.
arXiv Detail & Related papers (2024-02-21T23:22:32Z) - Actions Speak What You Want: Provably Sample-Efficient Reinforcement
Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure.
Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z) - Policy Optimization for Markov Games: Unified Framework and Faster
Convergence [81.3266426402464]
We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game.
We extend this algorithm to multi-player general Markov Games and show a $mathcalwidetildeO(T-1/2)$ convergence rate to Correlated Equilibria (CCE)
arXiv Detail & Related papers (2022-06-06T14:23:13Z) - No-Regret Learning in Dynamic Stackelberg Games [31.001205916012307]
In a Stackelberg game, a leader commits to a randomized strategy, and a follower chooses their best strategy in response.
We consider an extension of a standard Stackelberg game, called a discrete-time dynamic Stackelberg game, that has an underlying state space that affects the leader's rewards and available strategies and evolves in a Markovian manner depending on both the leader and follower's selected strategies.
arXiv Detail & Related papers (2022-02-10T01:07:57Z) - Can Reinforcement Learning Find Stackelberg-Nash Equilibria in
General-Sum Markov Games with Myopic Followers? [156.5760265539888]
We study multi-player general-sum Markov games with one of the players designated as the leader and the other players regarded as followers.
For such a game, our goal is to find a Stackelberg-Nash equilibrium (SNE), which is a policy pair $(pi*, nu*)$.
We develop sample-efficient reinforcement learning (RL) algorithms for solving for an SNE in both online and offline settings.
arXiv Detail & Related papers (2021-12-27T05:41:14Z) - Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games [78.65798135008419]
It remains vastly open how to learn the Stackelberg equilibrium in general-sum games efficiently from samples.
This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium in two-player turn-based general-sum games.
arXiv Detail & Related papers (2021-02-23T05:11:07Z) - Independent Policy Gradient Methods for Competitive Reinforcement
Learning [62.91197073795261]
We obtain global, non-asymptotic convergence guarantees for independent learning algorithms in competitive reinforcement learning settings with two agents.
We show that if both players run policy gradient methods in tandem, their policies will converge to a min-max equilibrium of the game, as long as their learning rates follow a two-timescale rule.
arXiv Detail & Related papers (2021-01-11T23:20:42Z) - Provable Fictitious Play for General Mean-Field Games [111.44976345867005]
We propose a reinforcement learning algorithm for stationary mean-field games.
The goal is to learn a pair of mean-field state and stationary policy that constitutes the Nash equilibrium.
arXiv Detail & Related papers (2020-10-08T18:46:48Z) - Efficient Competitive Self-Play Policy Optimization [20.023522000925094]
We propose a new algorithmic framework for competitive self-play reinforcement learning in two-player zero-sum games.
Our method trains several agents simultaneously, and intelligently takes each other as opponent based on simple adversarial rules.
We prove theoretically that our algorithm converges to an approximate equilibrium with high probability in convex-concave games.
arXiv Detail & Related papers (2020-09-13T21:01:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.