Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space
- URL: http://arxiv.org/abs/2408.07395v1
- Date: Wed, 14 Aug 2024 09:15:11 GMT
- Title: Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space
- Authors: Xiaoyang Yu, Youfang Lin, Shuo Wang, Kai Lv, Sheng Han,
- Abstract summary: In a multi-agent system, action semantics indicates the different influences of agents' actions toward other entities.
Previous multi-agent reinforcement learning (MARL) algorithms apply global parameter-sharing across different types of heterogeneous agents.
We introduce the Unified Action Space (UAS) to fulfill the requirement.
- Score: 22.535906675532196
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In a multi-agent system (MAS), action semantics indicates the different influences of agents' actions toward other entities, and can be used to divide agents into groups in a physically heterogeneous MAS. Previous multi-agent reinforcement learning (MARL) algorithms apply global parameter-sharing across different types of heterogeneous agents without careful discrimination of different action semantics. This common implementation decreases the cooperation and coordination between agents in complex situations. However, fully independent agent parameters dramatically increase the computational cost and training difficulty. In order to benefit from the usage of different action semantics while also maintaining a proper parameter-sharing structure, we introduce the Unified Action Space (UAS) to fulfill the requirement. The UAS is the union set of all agent actions with different semantics. All agents first calculate their unified representation in the UAS, and then generate their heterogeneous action policies using different available-action-masks. To further improve the training of extra UAS parameters, we introduce a Cross-Group Inverse (CGI) loss to predict other groups' agent policies with the trajectory information. As a universal method for solving the physically heterogeneous MARL problem, we implement the UAS adding to both value-based and policy-based MARL algorithms, and propose two practical algorithms: U-QMIX and U-MAPPO. Experimental results in the SMAC environment prove the effectiveness of both U-QMIX and U-MAPPO compared with several state-of-the-art MARL methods.
Related papers
- Causal Coordinated Concurrent Reinforcement Learning [8.654978787096807]
We propose a novel algorithmic framework for data sharing and coordinated exploration for the purpose of learning more data-efficient and better performing policies under a concurrent reinforcement learning setting.
Our algorithm leverages a causal inference algorithm in the form of Additive Noise Model - Mixture Model (ANM-MM) in extracting model parameters governing individual differentials via independence enforcement.
We propose a new data sharing scheme based on a similarity measure of the extracted model parameters and demonstrate superior learning speeds on a set of autoregressive, pendulum and cart-pole swing-up tasks.
arXiv Detail & Related papers (2024-01-31T17:20:28Z) - MaskMA: Towards Zero-Shot Multi-Agent Decision Making with Mask-Based
Collaborative Learning [56.00558959816801]
We propose a Mask-Based collaborative learning framework for Multi-Agent decision making (MaskMA)
We show MaskMA can achieve an impressive 77.8% average zero-shot win rate on 60 unseen test maps by decentralized execution.
arXiv Detail & Related papers (2023-10-18T09:53:27Z) - Deep Multi-Agent Reinforcement Learning for Decentralized Active
Hypothesis Testing [11.639503711252663]
We tackle the multi-agent active hypothesis testing (AHT) problem by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning.
We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance.
arXiv Detail & Related papers (2023-09-14T01:18:04Z) - ACE: Cooperative Multi-agent Q-learning with Bidirectional
Action-Dependency [65.28061634546577]
Multi-agent reinforcement learning (MARL) suffers from the non-stationarity problem.
In this paper, we propose bidirectional action-dependent Q-learning (ACE)
ACE outperforms the state-of-the-art algorithms on Google Research Football and StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2022-11-29T10:22:55Z) - Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - Multi-Agent MDP Homomorphic Networks [100.74260120972863]
In cooperative multi-agent systems, complex symmetries arise between different configurations of the agents and their local observations.
Existing work on symmetries in single agent reinforcement learning can only be generalized to the fully centralized setting.
This paper introduces Multi-Agent MDP Homomorphic Networks, a class of networks that allows distributed execution using only local information.
arXiv Detail & Related papers (2021-10-09T07:46:25Z) - MACRPO: Multi-Agent Cooperative Recurrent Policy Optimization [17.825845543579195]
We propose a new multi-agent actor-critic method called textitMulti-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO)
We use a recurrent layer in critic's network architecture and propose a new framework to use a meta-trajectory to train the recurrent layer.
We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces.
arXiv Detail & Related papers (2021-09-02T12:43:35Z) - Cooperative and Competitive Biases for Multi-Agent Reinforcement
Learning [12.676356746752893]
Training a multi-agent reinforcement learning (MARL) algorithm is more challenging than training a single-agent reinforcement learning algorithm.
We propose an algorithm that boosts MARL training using the biased action information of other agents based on a friend-or-foe concept.
We empirically demonstrate that our algorithm outperforms existing algorithms in various mixed cooperative-competitive environments.
arXiv Detail & Related papers (2021-01-18T05:52:22Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.