Related papers: Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning

Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2210.06274v2
Date: Mon, 5 Jun 2023 17:35:53 GMT
Title: Centralized Training with Hybrid Execution in Multi-Agent Reinforcement Learning
Authors: Pedro P. Santos, Diogo S. Carvalho, Miguel Vasco, Alberto Sardinha, Pedro A. Santos, Ana Paiva, Francisco S. Melo
Abstract summary: We introduce hybrid execution in multi-agent reinforcement learning (MARL) MARL is a new paradigm in which agents aim to successfully complete cooperative tasks with arbitrary communication levels at execution time. We contribute MARO, an approach that makes use of an auto-regressive predictive model, trained in a centralized manner, to estimate missing agents' observations.
Score: 7.163485179361718
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We introduce hybrid execution in multi-agent reinforcement learning (MARL), a new paradigm in which agents aim to successfully complete cooperative tasks with arbitrary communication levels at execution time by taking advantage of information-sharing among the agents. Under hybrid execution, the communication level can range from a setting in which no communication is allowed between agents (fully decentralized), to a setting featuring full communication (fully centralized), but the agents do not know beforehand which communication level they will encounter at execution time. To formalize our setting, we define a new class of multi-agent partially observable Markov decision processes (POMDPs) that we name hybrid-POMDPs, which explicitly model a communication process between the agents. We contribute MARO, an approach that makes use of an auto-regressive predictive model, trained in a centralized manner, to estimate missing agents' observations at execution time. We evaluate MARO on standard scenarios and extensions of previous benchmarks tailored to emphasize the negative impact of partial observability in MARL. Experimental results show that our method consistently outperforms relevant baselines, allowing agents to act with faulty communication while successfully exploiting shared information.

Related papers

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents [59.825725526176655]
Large Language Models (LLMs) have shown remarkable capabilities as autonomous agents. Existing benchmarks either focus on single-agent tasks or are confined to narrow domains, failing to capture the dynamics of multi-agent coordination and competition. We introduce MultiAgentBench, a benchmark designed to evaluate LLM-based multi-agent systems across diverse, interactive scenarios.
arXiv Detail & Related papers (2025-03-03T05:18:50Z)
Networked Agents in the Dark: Team Value Learning under Partial Observability [3.8779763612314633]
We propose a novel cooperative multi-agent reinforcement learning (MARL) approach for networked agents. In contrast to previous methods that rely on complete state information or joint observations, our agents must learn how to reach shared objectives under partial observability. During training, they collect individual rewards and approximate a team value function through local communication, resulting in cooperative behavior.
arXiv Detail & Related papers (2025-01-15T13:01:32Z)
Communication Learning in Multi-Agent Systems from Graph Modeling Perspective [62.13508281188895]
We introduce a novel approach wherein we conceptualize the communication architecture among agents as a learnable graph. We introduce a temporal gating mechanism for each agent, enabling dynamic decisions on whether to receive shared information at a given time.
arXiv Detail & Related papers (2024-11-01T05:56:51Z)
DCMAC: Demand-aware Customized Multi-Agent Communication via Upper Bound Training [9.068971933560416]
We propose a Demand-aware Customized Multi-Agent Communication protocol, which use an upper bound training to obtain the ideal policy. Experimental results reveal that DCMAC significantly outperforms the baseline algorithms in both unconstrained and communication constrained scenarios.
arXiv Detail & Related papers (2024-09-11T09:23:27Z)
Generalising Multi-Agent Cooperation through Task-Agnostic Communication [7.380444448047908]
Existing communication methods for multi-agent reinforcement learning (MARL) in cooperative multi-robot problems are almost exclusively task-specific, training new communication strategies for each unique task. We address this inefficiency by introducing a communication strategy applicable to any task within a given environment. Our objective is to learn a fixed-size latent Markov state from a variable number of agent observations. Our method enables seamless adaptation to novel tasks without fine-tuning the communication strategy, gracefully supports scaling to more agents than present during training, and detects out-of-distribution events in an environment.
arXiv Detail & Related papers (2024-03-11T14:20:13Z)
ProAgent: Building Proactive Cooperative Agents with Large Language Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents. ProAgent can analyze the present state, and infer the intentions of teammates from observations. ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z)
Multi-Agent Coordination via Multi-Level Communication [29.388570369796586]
We propose a novel multi-level communication scheme, Sequential Communication (SeqComm) In this paper, we propose a novel multi-level communication scheme, Sequential Communication (SeqComm)
arXiv Detail & Related papers (2022-09-26T14:08:03Z)
RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios. RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents. Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z)
Scalable Multi-Agent Model-Based Reinforcement Learning [1.95804735329484]
We propose a new method called MAMBA which utilizes Model-Based Reinforcement Learning (MBRL) to further leverage centralized training in cooperative environments. We argue that communication between agents is enough to sustain a world model for each agent during execution phase while imaginary rollouts can be used for training, removing the necessity to interact with the environment.
arXiv Detail & Related papers (2022-05-25T08:35:00Z)
Coordinating Policies Among Multiple Agents via an Intelligent Communication Channel [81.39444892747512]
In Multi-Agent Reinforcement Learning (MARL), specialized channels are often introduced that allow agents to communicate directly with one another. We propose an alternative approach whereby agents communicate through an intelligent facilitator that learns to sift through and interpret signals provided by all agents to improve the agents' collective performance.
arXiv Detail & Related papers (2022-05-21T14:11:33Z)
A Decentralized Communication Framework based on Dual-Level Recurrence for Multi-Agent Reinforcement Learning [5.220940151628735]
We present a dual-level recurrent communication framework for multi-agent systems. The first recurrence occurs in the communication sequence and is used to transmit communication data among agents. The second recurrence is based on the time sequence and combines the historical observations for each agent.
arXiv Detail & Related papers (2022-02-22T01:36:59Z)
UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn) UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features. Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.