Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement Learning
- URL: http://arxiv.org/abs/2005.13625v8
- Date: Tue, 31 Oct 2023 05:06:10 GMT
- Title: Revisiting Parameter Sharing in Multi-Agent Deep Reinforcement Learning
- Authors: J. K. Terry, Nathaniel Grammel, Sanghyun Son, Benjamin Black, Aakriti
Agrawal
- Abstract summary: We formalize the notion of agent indication and prove that it enables convergence to optimal policies for the first time.
Next, we formally introduce methods to extend parameter sharing to learning in heterogeneous observation and action spaces.
- Score: 14.017603575774361
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Parameter sharing, where each agent independently learns a policy with fully
shared parameters between all policies, is a popular baseline method for
multi-agent deep reinforcement learning. Unfortunately, since all agents share
the same policy network, they cannot learn different policies or tasks. This
issue has been circumvented experimentally by adding an agent-specific
indicator signal to observations, which we term "agent indication". Agent
indication is limited, however, in that without modification it does not allow
parameter sharing to be applied to environments where the action spaces and/or
observation spaces are heterogeneous. This work formalizes the notion of agent
indication and proves that it enables convergence to optimal policies for the
first time. Next, we formally introduce methods to extend parameter sharing to
learning in heterogeneous observation and action spaces, and prove that these
methods allow for convergence to optimal policies. Finally, we experimentally
confirm that the methods we introduce function empirically, and conduct a wide
array of experiments studying the empirical efficacy of many different agent
indication schemes for image based observation spaces.
Related papers
- Adaptive parameter sharing for multi-agent reinforcement learning [16.861543418593044]
We propose a novel parameter sharing method inspired by research pertaining to the brain in biology.
It maps each type of agent to different regions within a shared network based on their identity, resulting in distinctworks.
Our method can increase the diversity of strategies among different agents without additional training parameters.
arXiv Detail & Related papers (2023-12-14T15:00:32Z) - Invariant Causal Imitation Learning for Generalizable Policies [87.51882102248395]
We propose Invariant Causal Learning (ICIL) to learn an imitation policy.
ICIL learns a representation of causal features that is disentangled from the specific representations of noise variables.
We show that ICIL is effective in learning imitation policies capable of generalizing to unseen environments.
arXiv Detail & Related papers (2023-11-02T16:52:36Z) - Off-Policy Action Anticipation in Multi-Agent Reinforcement Learning [3.249853429482705]
Learning anticipation in Multi-Agent Reinforcement Learning (MARL) is a reasoning paradigm where agents anticipate the learning steps of other agents to improve cooperation among themselves.
Existing HOG methods are based on policy parameter anticipation, i.e., agents anticipate the changes in policy parameters of other agents.
We propose Off-Policy Action Anticipation (OffPA2), a novel framework that approaches learning anticipation through action anticipation.
arXiv Detail & Related papers (2023-04-04T01:44:19Z) - Policy Evaluation in Decentralized POMDPs with Belief Sharing [39.550233049869036]
We consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly.
We propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network.
arXiv Detail & Related papers (2023-02-08T15:54:15Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - Explaining Reinforcement Learning Policies through Counterfactual
Trajectories [147.7246109100945]
A human developer must validate that an RL agent will perform well at test-time.
Our method conveys how the agent performs under distribution shifts by showing the agent's behavior across a wider trajectory distribution.
In a user study, we demonstrate that our method enables users to score better than baseline methods on one of two agent validation tasks.
arXiv Detail & Related papers (2022-01-29T00:52:37Z) - Informative Policy Representations in Multi-Agent Reinforcement Learning
via Joint-Action Distributions [17.129962954873587]
In multi-agent reinforcement learning, the inherent non-stationarity of the environment caused by other agents' actions posed significant difficulties for an agent to learn a good policy independently.
We propose a general method to learn representations of other agents' policies via the joint-action distributions sampled in interactions.
We empirically demonstrate that our method outperforms existing work in multi-agent tasks when facing unseen agents.
arXiv Detail & Related papers (2021-06-10T15:09:33Z) - A Policy Gradient Algorithm for Learning to Learn in Multiagent
Reinforcement Learning [47.154539984501895]
We propose a novel meta-multiagent policy gradient theorem that accounts for the non-stationary policy dynamics inherent to multiagent learning settings.
This is achieved by modeling our gradient updates to consider both an agent's own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment.
arXiv Detail & Related papers (2020-10-31T22:50:21Z) - Variational Policy Propagation for Multi-agent Reinforcement Learning [68.26579560607597]
We propose a emphcollaborative multi-agent reinforcement learning algorithm named variational policy propagation (VPP) to learn a emphjoint policy through the interactions over agents.
We prove that the joint policy is a Markov Random Field under some mild conditions, which in turn reduces the policy space effectively.
We integrate the variational inference as special differentiable layers in policy such as the actions can be efficiently sampled from the Markov Random Field and the overall policy is differentiable.
arXiv Detail & Related papers (2020-04-19T15:42:55Z) - Multi-Agent Interactions Modeling with Correlated Policies [53.38338964628494]
In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework.
We develop a Decentralized Adrial Imitation Learning algorithm with Correlated policies (CoDAIL)
Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators.
arXiv Detail & Related papers (2020-01-04T17:31:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.