Model Free Reinforcement Learning Algorithm for Stationary Mean field
Equilibrium for Multiple Types of Agents
- URL: http://arxiv.org/abs/2012.15377v1
- Date: Thu, 31 Dec 2020 00:12:46 GMT
- Title: Model Free Reinforcement Learning Algorithm for Stationary Mean field
Equilibrium for Multiple Types of Agents
- Authors: Arnob Ghosh and Vaneet Aggarwal
- Abstract summary: We consider a multi-agent strategic interaction over an infinite horizon where agents can be of multiple types.
Each agent has a private state; the state evolves depending on the distribution of the state of the agents of different types and the action of the agent.
We show how such kind of interaction can model the cyber attacks among defenders and adversaries.
- Score: 43.21120427632336
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a multi-agent Markov strategic interaction over an infinite
horizon where agents can be of multiple types. We model the strategic
interaction as a mean-field game in the asymptotic limit when the number of
agents of each type becomes infinite. Each agent has a private state; the state
evolves depending on the distribution of the state of the agents of different
types and the action of the agent. Each agent wants to maximize the discounted
sum of rewards over the infinite horizon which depends on the state of the
agent and the distribution of the state of the leaders and followers. We seek
to characterize and compute a stationary multi-type Mean field equilibrium
(MMFE) in the above game. We characterize the conditions under which a
stationary MMFE exists. Finally, we propose Reinforcement learning (RL) based
algorithm using policy gradient approach to find the stationary MMFE when the
agents are unaware of the dynamics. We, numerically, evaluate how such kind of
interaction can model the cyber attacks among defenders and adversaries, and
show how RL based algorithm can converge to an equilibrium.
Related papers
- Linear Convergence of Independent Natural Policy Gradient in Games with Entropy Regularization [12.612009339150504]
This work focuses on the entropy-regularized independent natural policy gradient (NPG) algorithm in multi-agent reinforcement learning.
We show that, under sufficient entropy regularization, the dynamics of this system converge at a linear rate to the quantal response equilibrium (QRE)
arXiv Detail & Related papers (2024-05-04T22:48:53Z) - Learning and Calibrating Heterogeneous Bounded Rational Market Behaviour
with Multi-Agent Reinforcement Learning [4.40301653518681]
Agent-based models (ABMs) have shown promise for modelling various real world phenomena incompatible with traditional equilibrium analysis.
Recent developments in multi-agent reinforcement learning (MARL) offer a way to address this issue from a rationality perspective.
We propose a novel technique for representing heterogeneous processing-constrained agents within a MARL framework.
arXiv Detail & Related papers (2024-02-01T17:21:45Z) - On Imperfect Recall in Multi-Agent Influence Diagrams [57.21088266396761]
Multi-agent influence diagrams (MAIDs) are a popular game-theoretic model based on Bayesian networks.
We show how to solve MAIDs with forgetful and absent-minded agents using mixed policies and two types of correlated equilibrium.
We also describe applications of MAIDs to Markov games and team situations, where imperfect recall is often unavoidable.
arXiv Detail & Related papers (2023-07-11T07:08:34Z) - Local Optimization Achieves Global Optimality in Multi-Agent
Reinforcement Learning [139.53668999720605]
We present a multi-agent PPO algorithm in which the local policy of each agent is updated similarly to vanilla PPO.
We prove that with standard regularity conditions on the Markov game and problem-dependent quantities, our algorithm converges to the globally optimal policy at a sublinear rate.
arXiv Detail & Related papers (2023-05-08T16:20:03Z) - Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential
Decision-Making in Multi-Agent Reinforcement Learning [17.101534531286298]
We construct a Nash-level policy model based on a conditional hypernetwork shared by all agents.
This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents.
Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios.
arXiv Detail & Related papers (2023-04-20T14:47:54Z) - Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret
Learning in Markov Games [95.10091348976779]
We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents.
We propose a new algorithm, underlineDecentralized underlineOptimistic hypeunderlineRpolicy munderlineIrror deunderlineScent (DORIS)
DORIS achieves $sqrtK$-regret in the context of general function approximation, where $K$ is the number of episodes.
arXiv Detail & Related papers (2022-06-03T14:18:05Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - ERMAS: Becoming Robust to Reward Function Sim-to-Real Gaps in
Multi-Agent Simulations [110.72725220033983]
Epsilon-Robust Multi-Agent Simulation (ERMAS) is a framework for learning AI policies that are robust to such multiagent sim-to-real gaps.
ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
In particular, ERMAS learns tax policies that are robust to changes in agent risk aversion, improving social welfare by up to 15% in complextemporal simulations.
arXiv Detail & Related papers (2021-06-10T04:32:20Z) - Non-cooperative Multi-agent Systems with Exploring Agents [10.736626320566707]
We develop a prescriptive model of multi-agent behavior using Markov games.
We focus on models in which the agents play "exploration but near optimum strategies"
arXiv Detail & Related papers (2020-05-25T19:34:29Z) - Multi Type Mean Field Reinforcement Learning [26.110052366068533]
We extend mean field multiagent algorithms to multiple types.
We conduct experiments on three different testbeds for the field of many agent reinforcement learning.
arXiv Detail & Related papers (2020-02-06T20:58:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.