Resolving Implicit Coordination in Multi-Agent Deep Reinforcement
Learning with Deep Q-Networks & Game Theory
- URL: http://arxiv.org/abs/2012.09136v1
- Date: Tue, 8 Dec 2020 17:30:47 GMT
- Title: Resolving Implicit Coordination in Multi-Agent Deep Reinforcement
Learning with Deep Q-Networks & Game Theory
- Authors: Griffin Adams, Sarguna Janani Padmanabhan, Shivang Shekhar
- Abstract summary: We address two major challenges of implicit coordination in deep reinforcement learning: non-stationarity and exponential growth of state-action space.
We demonstrate that knowledge of game type leads to an assumption of mirrored best responses and faster convergence than Nash-Q.
Inspired by the dueling network architecture, we learn both a single and joint agent representation, and merge them via element-wise addition.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We address two major challenges of implicit coordination in multi-agent deep
reinforcement learning: non-stationarity and exponential growth of state-action
space, by combining Deep-Q Networks for policy learning with Nash equilibrium
for action selection. Q-values proxy as payoffs in Nash settings, and mutual
best responses define joint action selection. Coordination is implicit because
multiple/no Nash equilibria are resolved deterministically. We demonstrate that
knowledge of game type leads to an assumption of mirrored best responses and
faster convergence than Nash-Q. Specifically, the Friend-or-Foe algorithm
demonstrates signs of convergence to a Set Controller which jointly chooses
actions for two agents. This encouraging given the highly unstable nature of
decentralized coordination over joint actions. Inspired by the dueling network
architecture, which decouples the Q-function into state and advantage streams,
as well as residual networks, we learn both a single and joint agent
representation, and merge them via element-wise addition. This simplifies
coordination by recasting it is as learning a residual function. We also draw
high level comparative insights on key MADRL and game theoretic variables:
competitive vs. cooperative, asynchronous vs. parallel learning, greedy versus
socially optimal Nash equilibria tie breaking, and strategies for the no Nash
equilibrium case. We evaluate on 3 custom environments written in Python using
OpenAI Gym: a Predator Prey environment, an alternating Warehouse environment,
and a Synchronization environment. Each environment requires successively more
coordination to achieve positive rewards.
Related papers
- Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors [3.9801926395657325]
This paper proposes a deep Q-networks (DQN) algorithm capable of learning various Q-vectors using Max, Nash, and Maximin strategies.
The effectiveness of this approach is demonstrated in an environment where dual robotic arms collaborate to lift a pot.
arXiv Detail & Related papers (2024-06-12T03:30:10Z) - Inducing Stackelberg Equilibrium through Spatio-Temporal Sequential
Decision-Making in Multi-Agent Reinforcement Learning [17.101534531286298]
We construct a Nash-level policy model based on a conditional hypernetwork shared by all agents.
This approach allows for asymmetric training with symmetric execution, with each agent responding optimally conditioned on the decisions made by superior agents.
Experiments demonstrate that our method effectively converges to the SE policies in repeated matrix game scenarios.
arXiv Detail & Related papers (2023-04-20T14:47:54Z) - Differentiable Arbitrating in Zero-sum Markov Games [59.62061049680365]
We study how to perturb the reward in a zero-sum Markov game with two players to induce a desirable Nash equilibrium, namely arbitrating.
The lower level requires solving the Nash equilibrium under a given reward function, which makes the overall problem challenging to optimize in an end-to-end way.
We propose a backpropagation scheme that differentiates through the Nash equilibrium, which provides the gradient feedback for the upper level.
arXiv Detail & Related papers (2023-02-20T16:05:04Z) - Network coevolution drives segregation and enhances Pareto optimal
equilibrium selection in coordination games [0.0]
We analyze a coevolution model that couples the changes in agents' actions with the network dynamics.
We find that both for RD and UI in a GCG, there is a regime of intermediate values of plasticity.
Coevolution enhances payoff-dominant equilibrium selection for both update rules.
arXiv Detail & Related papers (2022-11-22T09:33:02Z) - Game-Theoretical Perspectives on Active Equilibria: A Preferred Solution
Concept over Nash Equilibria [61.093297204685264]
An effective approach in multiagent reinforcement learning is to consider the learning process of agents and influence their future policies.
This new solution concept is general such that standard solution concepts, such as a Nash equilibrium, are special cases of active equilibria.
We analyze active equilibria from a game-theoretic perspective by closely studying examples where Nash equilibria are known.
arXiv Detail & Related papers (2022-10-28T14:45:39Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Learn to Match with No Regret: Reinforcement Learning in Markov Matching
Markets [151.03738099494765]
We study a Markov matching market involving a planner and a set of strategic agents on the two sides of the market.
We propose a reinforcement learning framework that integrates optimistic value iteration with maximum weight matching.
We prove that the algorithm achieves sublinear regret.
arXiv Detail & Related papers (2022-03-07T19:51:25Z) - Distributed Adaptive Learning Under Communication Constraints [54.22472738551687]
This work examines adaptive distributed learning strategies designed to operate under communication constraints.
We consider a network of agents that must solve an online optimization problem from continual observation of streaming data.
arXiv Detail & Related papers (2021-12-03T19:23:48Z) - Towards convergence to Nash equilibria in two-team zero-sum games [17.4461045395989]
Two-team zero-sum games are defined as multi-player games where players are split into two competing sets of agents.
We focus on the solution concept of Nash equilibria (NE)
We show that computing NE for this class of games is $textithard$ for the complexity class $mathrm$.
arXiv Detail & Related papers (2021-11-07T21:15:35Z) - Decentralized Cooperative Multi-Agent Reinforcement Learning with
Exploration [35.75029940279768]
We study multi-agent reinforcement learning in the most basic cooperative setting -- Markov teams.
We propose an algorithm in which each agent independently runs a stage-based V-learning style algorithm.
We show that the agents can learn an $epsilon$-approximate Nash equilibrium policy in at most $proptowidetildeO (1/epsilon4)$ episodes.
arXiv Detail & Related papers (2021-10-12T02:45:12Z) - On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.