Related papers: Hierarchical Lead Critic based Multi-Agent Reinforcement Learning

Hierarchical Lead Critic based Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2602.21680v1
Date: Wed, 25 Feb 2026 08:33:39 GMT
Title: Hierarchical Lead Critic based Multi-Agent Reinforcement Learning
Authors: David Eckel, Henri Meeß,
Abstract summary: We introduce a novel sequential training scheme and MARL architecture, which learns from multiple perspectives on different hierarchy levels.<n>HLC demonstrates that introducing multiple hierarchies, leveraging local and global perspectives, can lead to improved performance with high sample efficiency and robust policies.
Score: 1.4323566945483497
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cooperative Multi-Agent Reinforcement Learning (MARL) solves complex tasks that require coordination from multiple agents, but is often limited to either local (independent learning) or global (centralized learning) perspectives. In this paper, we introduce a novel sequential training scheme and MARL architecture, which learns from multiple perspectives on different hierarchy levels. We propose the Hierarchical Lead Critic (HLC) - inspired by natural emerging distributions in team structures, where following high-level objectives combines with low-level execution. HLC demonstrates that introducing multiple hierarchies, leveraging local and global perspectives, can lead to improved performance with high sample efficiency and robust policies. Experimental results conducted on cooperative, non-communicative, and partially observable MARL benchmarks demonstrate that HLC outperforms single hierarchy baselines and scales robustly with increasing amounts of agents and difficulty.

Related papers

MARTI-MARS$^2$: Scaling Multi-Agent Self-Search via Reinforcement Learning for Code Generation [64.2621682259008]
Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2)<n>We propose a Multi-Agent Reinforced Training and Inference Framework with Self-Search Scaling (MARTI-MARS2) to integrate policy learning with multi-agent tree search.<n>We show that MARTI-MARS2 achieves 77.7%, outperforming strong baselines like GPT-5.1 on challenging code generation benchmarks.
arXiv Detail & Related papers (2026-02-08T07:28:44Z)
Reinforcement Networks: novel framework for collaborative Multi-Agent Reinforcement Learning tasks [0.0]
We introduce Reinforcement Networks, a general framework for the Multi-Agent Reinforcement Learning (MARL) field.<n>We formalize training and inference methods for the Reinforcement Networks framework and connect it to the LevelEnv concept to support reproducible construction, training, and evaluation.<n>Beyond empirical gains, Reinforcement Networks unify hierarchical, modular, and graph-structured views of MARL, opening a principled path toward designing and training complex multi-agent systems.
arXiv Detail & Related papers (2025-12-28T10:56:20Z)
Hierarchical Message-Passing Policies for Multi-Agent Reinforcement Learning [19.739901034066587]
We propose a novel and effective methodology for learning multi-agent hierarchies of message-passing policies.<n>Agents at lower levels in the hierarchy receive goals from the upper levels and exchange messages with neighboring agents at the same level.<n>Results on relevant benchmarks show that our method performs favorably compared to the state of the art.
arXiv Detail & Related papers (2025-07-31T14:42:12Z)
Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts.<n>This work is the first work to explicitly address the distributional gap between offline and online MARL.
arXiv Detail & Related papers (2025-05-09T11:42:31Z)
Enhancing Multi-Agent Systems via Reinforcement Learning with LLM-based Planner and Graph-based Policy [31.041340552853004]
Graph Collaboration MARL (LGC-MARL) is a framework that efficiently combines Large Language Models (LLMs) and Multi-Agent Reinforcement Learning (MARL)<n>LGC-MARL decomposes complex tasks into executable subtasks and achieves efficient collaboration among multiple agents through graph-based coordination.<n> Experimental results on the AI2-THOR simulation platform demonstrate the superior performance and scalability of LGC-MARL.
arXiv Detail & Related papers (2025-03-13T05:02:49Z)
ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning [53.817538122688944]
We introduce Reinforced Meta-thinking Agents (ReMA) to elicit meta-thinking behaviors from Reasoning of Large Language Models (LLMs)<n>ReMA decouples the reasoning process into two hierarchical agents: a high-level meta-thinking agent responsible for generating strategic oversight and plans, and a low-level reasoning agent for detailed executions.<n> Empirical results from single-turn experiments demonstrate that ReMA outperforms single-agent RL baselines on complex reasoning tasks.
arXiv Detail & Related papers (2025-03-12T16:05:31Z)
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents [59.825725526176655]
Large Language Models (LLMs) have shown remarkable capabilities as autonomous agents.<n>Existing benchmarks either focus on single-agent tasks or are confined to narrow domains, failing to capture the dynamics of multi-agent coordination and competition.<n>We introduce MultiAgentBench, a benchmark designed to evaluate LLM-based multi-agent systems across diverse, interactive scenarios.
arXiv Detail & Related papers (2025-03-03T05:18:50Z)
Exploiting Structure in Offline Multi-Agent RL: The Benefits of Low Interaction Rank [52.831993899183416]
We introduce a structural assumption -- the interaction rank -- and establish that functions with low interaction rank are significantly more robust to distribution shift compared to general ones. We demonstrate that utilizing function classes with low interaction rank, when combined with regularization and no-regret learning, admits decentralized, computationally and statistically efficient learning in offline MARL.
arXiv Detail & Related papers (2024-10-01T22:16:22Z)
Hierarchical Consensus-Based Multi-Agent Reinforcement Learning for Multi-Robot Cooperation Tasks [17.914928652949314]
We introduce the Hierarchical Consensus-based Multi-Agent Reinforcement Learning (HC-MARL) framework to address this limitation. HC-MARL employs contrastive learning to foster a global consensus among agents, enabling cooperative behavior without direct communication. To cater to the dynamic requirements of various tasks, consensus is divided into multiple layers, encompassing both short-term and long-term considerations.
arXiv Detail & Related papers (2024-07-11T03:55:55Z)
Multi-Agent Reinforcement Learning with a Hierarchy of Reward Machines [5.600971575680638]
We study the cooperative Multi-Agent Reinforcement Learning (MARL) problems using Reward Machines (RMs) We present Multi-Agent Reinforcement Learning with a hierarchy of RMs (MAHRM) that is capable of dealing with more complex scenarios. Experimental results in three cooperative MARL domains show that MAHRM outperforms other MARL methods using the same prior knowledge of high-level events.
arXiv Detail & Related papers (2024-03-08T06:38:22Z)
Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL) It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks. The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z)
Locality Matters: A Scalable Value Decomposition Approach for Cooperative Multi-Agent Reinforcement Learning [52.7873574425376]
Cooperative multi-agent reinforcement learning (MARL) faces significant scalability issues due to state and action spaces that are exponentially large in the number of agents. We propose a novel, value-based multi-agent algorithm called LOMAQ, which incorporates local rewards in the Training Decentralized Execution paradigm.
arXiv Detail & Related papers (2021-09-22T10:08:15Z)

This list is automatically generated from the titles and abstracts of the papers in this site.