Related papers: Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2206.07505v1
Date: Wed, 15 Jun 2022 13:03:05 GMT
Title: Revisiting Some Common Practices in Cooperative Multi-Agent Reinforcement Learning
Authors: Wei Fu, Chao Yu, Zelai Xu, Jiaqi Yang, and Yi Wu
Abstract summary: We show that in environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes. In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases. We present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors.
Score: 11.91425153754564
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Many advances in cooperative multi-agent reinforcement learning (MARL) are based on two common design principles: value decomposition and parameter sharing. A typical MARL algorithm of this fashion decomposes a centralized Q-function into local Q-networks with parameters shared across agents. Such an algorithmic paradigm enables centralized training and decentralized execution (CTDE) and leads to efficient learning in practice. Despite all the advantages, we revisit these two principles and show that in certain scenarios, e.g., environments with a highly multi-modal reward landscape, value decomposition, and parameter sharing can be problematic and lead to undesired outcomes. In contrast, policy gradient (PG) methods with individual policies provably converge to an optimal solution in these cases, which partially supports some recent empirical observations that PG can be effective in many MARL testbeds. Inspired by our theoretical analysis, we present practical suggestions on implementing multi-agent PG algorithms for either high rewards or diverse emergent behaviors and empirically validate our findings on a variety of domains, ranging from the simplified matrix and grid-world games to complex benchmarks such as StarCraft Multi-Agent Challenge and Google Research Football. We hope our insights could benefit the community towards developing more general and more powerful MARL algorithms. Check our project website at https://sites.google.com/view/revisiting-marl.

Related papers

MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation [64.2621682259008]
Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2)<n>We propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2) to integrate policy learning with multi-agent tree search.<n>We show that MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1 on challenging code generation benchmarks.
arXiv Detail & Related papers (2026-02-08T07:28:44Z)
Preference-Guided Learning for Sparse-Reward Multi-Agent Reinforcement Learning [15.034714081414691]
We study the problem of online multi-agent reinforcement learning (MARL) in environments with sparse rewards.<n>The lack of intermediate rewards hinders standard MARL algorithms from effectively guiding policy learning.<n>We propose a novel framework that integrates online inverse preference learning with multi-agent on-policy optimization.
arXiv Detail & Related papers (2025-09-26T03:41:40Z)
A Comprehensive Survey on Multi-Agent Cooperative Decision-Making: Scenarios, Approaches, Challenges and Perspectives [6.277211882332452]
Multi-agent cooperative decision-making involves multiple agents working together to complete established tasks and achieve specific objectives. These techniques are widely applicable in real-world scenarios such as autonomous driving, drone navigation, disaster rescue, and simulated military confrontations.
arXiv Detail & Related papers (2025-03-17T17:45:46Z)
Multi-Agent Inverse Q-Learning from Demonstrations [3.4136908117644698]
Multi-Agent Marginal Q-Learning from Demonstrations (MAMQL) is a novel sample-efficient framework for multi-agent IRL. We show MAMQL significantly outperforms previous multi-agent methods in average reward, sample efficiency, and reward recovery by often more than 2-5x.
arXiv Detail & Related papers (2025-03-06T18:22:29Z)
SIGMA: Sheaf-Informed Geometric Multi-Agent Pathfinding [11.38008343729117]
Multi-Agent Path Finding problem is core challenge for robotic deployments in large-scale logistics and transportation.<n>We introduce a new framework that applies sheaf theory to decentralized deep reinforcement learning.<n>Our proposed method demonstrates significant improvements over state-of-the-art learning-based MAPF planners.
arXiv Detail & Related papers (2025-02-10T13:17:34Z)
POGEMA: A Benchmark Platform for Cooperative Multi-Agent Navigation [76.67608003501479]
We introduce and specify an evaluation protocol defining a range of domain-related metrics computed on the basics of the primary evaluation indicators. The results of such a comparison, which involves a variety of state-of-the-art MARL, search-based, and hybrid methods, are presented.
arXiv Detail & Related papers (2024-07-20T16:37:21Z)
Ensembling Prioritized Hybrid Policies for Multi-agent Pathfinding [18.06081009550052]
Multi-Agent Reinforcement Learning (MARL) based Multi-Agent Path Finding (MAPF) has recently gained attention due to its efficiency and scalability. Several MARL-MAPF methods choose to use communication to enrich the information one agent can perceive. We propose a new method, Ensembling Prioritized Hybrid Policies (EPH)
arXiv Detail & Related papers (2024-03-12T11:47:12Z)
Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning [7.784991832712813]
We introduce a Bayesian network to inaugurate correlations between agents' action selections in their joint policy. We develop practical algorithms to learn the context-aware Bayesian network policies. Empirical results on a range of MARL benchmarks show the benefits of our approach.
arXiv Detail & Related papers (2023-06-02T21:22:27Z)
Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z)
Towards Global Optimality in Cooperative MARL with the Transformation And Distillation Framework [26.612749327414335]
Decentralized execution is one core demand in cooperative multi-agent reinforcement learning (MARL) In this paper, we theoretically analyze two common classes of algorithms with decentralized policies -- multi-agent policy gradient methods and value-decomposition methods. We show that TAD-PPO can theoretically perform optimal policy learning in the finite multi-agent MDPs and shows significant outperformance on a large set of cooperative multi-agent tasks.
arXiv Detail & Related papers (2022-07-12T06:59:13Z)
The Multi-Agent Pickup and Delivery Problem: MAPF, MARL and Its Warehouse Applications [2.969705152497174]
We study two state-of-the-art solutions to the multi-agent pickup and delivery problem based on different principles. Specifically, a recent MAPF algorithm called conflict-based search (CBS) and a current MARL algorithm called shared experience actor-critic (SEAC) are studied.
arXiv Detail & Related papers (2022-03-14T13:23:35Z)
Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents. We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z)
Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable Grid Environments [62.997667081978825]
We consider the problem of multi-agent navigation in partially observable grid environments. We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these policies to reach their goals.
arXiv Detail & Related papers (2021-08-13T09:44:47Z)
Benchmarking Multi-Agent Deep Reinforcement Learning Algorithms in Cooperative Tasks [11.480994804659908]
Multi-agent deep reinforcement learning (MARL) suffers from a lack of commonly-used evaluation tasks and criteria. We provide a systematic evaluation and comparison of three different classes of MARL algorithms. Our experiments serve as a reference for the expected performance of algorithms across different learning tasks.
arXiv Detail & Related papers (2020-06-14T11:22:53Z)
F2A2: Flexible Fully-decentralized Approximate Actor-critic for Cooperative Multi-agent Reinforcement Learning [110.35516334788687]
Decentralized multi-agent reinforcement learning algorithms are sometimes unpractical in complicated applications. We propose a flexible fully decentralized actor-critic MARL framework, which can handle large-scale general cooperative multi-agent setting. Our framework can achieve scalability and stability for large-scale environment and reduce information transmission.
arXiv Detail & Related papers (2020-04-17T14:56:29Z)
FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC) It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.