Many Agent Reinforcement Learning Under Partial Observability
- URL: http://arxiv.org/abs/2106.09825v1
- Date: Thu, 17 Jun 2021 21:24:29 GMT
- Title: Many Agent Reinforcement Learning Under Partial Observability
- Authors: Keyang He, Prashant Doshi, Bikramjit Banerjee
- Abstract summary: We show that our instantiations can learn the optimal behavior in a broader class of agent networks than the mean-field method.
We show that our instantiations can learn the optimal behavior in a broader class of agent networks than the mean-field method.
- Score: 10.11960004698409
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent renewed interest in multi-agent reinforcement learning (MARL) has
generated an impressive array of techniques that leverage deep reinforcement
learning, primarily actor-critic architectures, and can be applied to a limited
range of settings in terms of observability and communication. However, a
continuing limitation of much of this work is the curse of dimensionality when
it comes to representations based on joint actions, which grow exponentially
with the number of agents. In this paper, we squarely focus on this challenge
of scalability. We apply the key insight of action anonymity, which leads to
permutation invariance of joint actions, to two recently presented deep MARL
algorithms, MADDPG and IA2C, and compare these instantiations to another recent
technique that leverages action anonymity, viz., mean-field MARL. We show that
our instantiations can learn the optimal behavior in a broader class of agent
networks than the mean-field method, using a recently introduced pragmatic
domain.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - FlickerFusion: Intra-trajectory Domain Generalizing Multi-Agent RL [19.236153474365747]
Existing MARL approaches often rely on the restrictive assumption that the number of entities remains constant between training and inference.
In this paper, we tackle the challenge of intra-trajectory dynamic entity composition under zero-shot out-of-domain (OOD) generalization.
We propose FlickerFusion, a novel OOD generalization method that acts as a universally applicable augmentation technique for MARL backbone methods.
arXiv Detail & Related papers (2024-10-21T10:57:45Z) - MA2CL:Masked Attentive Contrastive Learning for Multi-Agent
Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL)
MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space.
Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z) - Latent Interactive A2C for Improved RL in Open Many-Agent Systems [12.41853254173419]
Interactive advantage actor critic (IA2C) engages in decentralized training and decentralized execution.
We present the latent IA2C that utilizes an encoder-decoder architecture to learn a latent representation of the hidden state and other agents' actions.
Our experiments in two domains -- each populated by many agents -- reveal that the latent IA2C significantly improves sample efficiency by reducing variance and converging faster.
arXiv Detail & Related papers (2023-05-09T04:03:40Z) - A Variational Approach to Mutual Information-Based Coordination for
Multi-Agent Reinforcement Learning [17.893310647034188]
We propose a new mutual information framework for multi-agent reinforcement learning.
Applying policy to maximize the derived lower bound, we propose a practical algorithm named variational maximum mutual information multi-agent actor-critic.
arXiv Detail & Related papers (2023-03-01T12:21:30Z) - Beyond Rewards: a Hierarchical Perspective on Offline Multiagent
Behavioral Analysis [14.656957226255628]
We introduce a model-agnostic method for discovery of behavior clusters in multiagent domains.
Our framework makes no assumption about agents' underlying learning algorithms, does not require access to their latent states or models, and can be trained using entirely offline observational data.
arXiv Detail & Related papers (2022-06-17T23:07:33Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - MCDAL: Maximum Classifier Discrepancy for Active Learning [74.73133545019877]
Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition.
We propose in this paper a novel active learning framework that we call Maximum Discrepancy for Active Learning (MCDAL)
In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them.
arXiv Detail & Related papers (2021-07-23T06:57:08Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.