Multi-Agent MDP Homomorphic Networks
- URL: http://arxiv.org/abs/2110.04495v1
- Date: Sat, 9 Oct 2021 07:46:25 GMT
- Title: Multi-Agent MDP Homomorphic Networks
- Authors: Elise van der Pol, Herke van Hoof, Frans A. Oliehoek, Max Welling
- Abstract summary: In cooperative multi-agent systems, complex symmetries arise between different configurations of the agents and their local observations.
Existing work on symmetries in single agent reinforcement learning can only be generalized to the fully centralized setting.
This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information.
- Score: 100.74260120972863
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper introduces Multi-Agent MDP Homomorphic Networks, a class of
networks that allows distributed execution using only local information, yet is
able to share experience between global symmetries in the joint state-action
space of cooperative multi-agent systems. In cooperative multi-agent systems,
complex symmetries arise between different configurations of the agents and
their local observations. For example, consider a group of agents navigating:
rotating the state globally results in a permutation of the optimal joint
policy. Existing work on symmetries in single agent reinforcement learning can
only be generalized to the fully centralized setting, because such approaches
rely on the global symmetry in the full state-action spaces, and these can
result in correspondences across agents. To encode such symmetries while still
allowing distributed execution we propose a factorization that decomposes
global symmetries into local transformations. Our proposed factorization allows
for distributing the computation that enforces global symmetries over local
agents and local interactions. We introduce a multi-agent equivariant policy
network based on this factorization. We show empirically on symmetric
multi-agent problems that distributed execution of globally symmetric policies
improves data efficiency compared to non-equivariant baselines.
Related papers
- Networked Communication for Mean-Field Games with Function Approximation and Empirical Mean-Field Estimation [59.01527054553122]
Decentralised agents can learn equilibria in Mean-Field Games from a single, non-episodic run of the empirical system.
We introduce function approximation to the existing setting, drawing on the Munchausen Online Mirror Descent method.
We additionally provide new algorithms that allow agents to estimate the global empirical distribution based on a local neighbourhood.
arXiv Detail & Related papers (2024-08-21T13:32:46Z) - Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space [22.535906675532196]
In a multi-agent system, action semantics indicates the different influences of agents' actions toward other entities.
Previous multi-agent reinforcement learning (MARL) algorithms apply global parameter-sharing across different types of heterogeneous agents.
We introduce the Unified Action Space (UAS) to fulfill the requirement.
arXiv Detail & Related papers (2024-08-14T09:15:11Z) - Self-Supervised Detection of Perfect and Partial Input-Dependent Symmetries [11.54837584979607]
Group equivariance can overly constrain models if the symmetries in the group differ from those observed in data.
We propose a method able to detect the level of symmetry of each input without the need for labels.
Our framework is general enough to accommodate different families of both continuous and discrete symmetry distributions.
arXiv Detail & Related papers (2023-12-19T15:11:46Z) - Compressed Regression over Adaptive Networks [58.79251288443156]
We derive the performance achievable by a network of distributed agents that solve, adaptively and in the presence of communication constraints, a regression problem.
We devise an optimized allocation strategy where the parameters necessary for the optimization can be learned online by the agents.
arXiv Detail & Related papers (2023-04-07T13:41:08Z) - Policy Evaluation in Decentralized POMDPs with Belief Sharing [39.550233049869036]
We consider a cooperative policy evaluation task in which agents are not assumed to observe the environment state directly.
We propose a fully decentralized belief forming strategy that relies on individual updates and on localized interactions over a communication network.
arXiv Detail & Related papers (2023-02-08T15:54:15Z) - Independent Natural Policy Gradient Methods for Potential Games:
Finite-time Global Convergence with Entropy Regularization [28.401280095467015]
We study the finite-time convergence of independent entropy-regularized natural policy gradient (NPG) methods for potential games.
We show that the proposed method converges to the quantal response equilibrium (QRE) at a sublinear rate, which is independent of the size of the action space.
arXiv Detail & Related papers (2022-04-12T01:34:02Z) - Dimension-Free Rates for Natural Policy Gradient in Multi-Agent
Reinforcement Learning [22.310861786709538]
We propose a scalable algorithm for cooperative multi-agent reinforcement learning.
We show that our algorithm converges to the globally optimal policy with a dimension-free statistical and computational complexity.
arXiv Detail & Related papers (2021-09-23T23:38:15Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Implicit Distributional Reinforcement Learning [61.166030238490634]
implicit distributional actor-critic (IDAC) built on two deep generator networks (DGNs)
Semi-implicit actor (SIA) powered by a flexible policy distribution.
We observe IDAC outperforms state-of-the-art algorithms on representative OpenAI Gym environments.
arXiv Detail & Related papers (2020-07-13T02:52:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.