Related papers: Action-Graph Policies: Learning Action Co-dependencies in Multi-Agent Reinforcement Learning

Action-Graph Policies: Learning Action Co-dependencies in Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2602.17009v2
Date: Sun, 22 Feb 2026 02:43:59 GMT
Title: Action-Graph Policies: Learning Action Co-dependencies in Multi-Agent Reinforcement Learning
Authors: Nikunj Gupta, James Zachary Hare, Jesse Milzman, Rajgopal Kannan, Viktor Prasanna,
Abstract summary: Coordinating actions is the most fundamental form of cooperation in multi-agent reinforcement learning.<n>We propose Action Graph Policies (AGP), that model dependencies among agents' available action choices.<n>AGP consistently outperforms other MARL methods in diverse multi-agent environments.
Score: 7.702487800530373
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Coordinating actions is the most fundamental form of cooperation in multi-agent reinforcement learning (MARL). Successful decentralized decision-making often depends not only on good individual actions, but on selecting compatible actions across agents to synchronize behavior, avoid conflicts, and satisfy global constraints. In this paper, we propose Action Graph Policies (AGP), that model dependencies among agents' available action choices. It constructs, what we call, \textit{coordination contexts}, that enable agents to condition their decisions on global action dependencies. Theoretically, we show that AGPs induce a strictly more expressive joint policy compared to fully independent policies and can realize coordinated joint actions that are provably more optimal than greedy execution even from centralized value-decomposition methods. Empirically, we show that AGP achieves 80-95\% success on canonical coordination tasks with partial observability and anti-coordination penalties, where other MARL methods reach only 10-25\%. We further demonstrate that AGP consistently outperforms these baselines in diverse multi-agent environments.

Related papers

Adaptive Value Decomposition: Coordinating a Varying Number of Agents in Urban Systems [19.19146852846605]
Adaptive Value Decomposition (AVD) is a cooperative MARL framework that adapts to a dynamically changing agent population.<n>A training-execution strategy is designed to accommodate asynchronous decision-making when some agents act at different times.
arXiv Detail & Related papers (2026-02-10T03:41:14Z)
Action Dependency Graphs for Globally Optimal Coordinated Reinforcement Learning [0.0]
Action-dependent individual policies have emerged as a promising paradigm for achieving global optimality in multi-agent reinforcement learning.<n>In this work, we consider a more generalized class of action-dependent policies, which do not necessarily follow the auto-regressive form.<n>Within the context of MARL problems structured by coordination graphs, we prove that an action-dependent policy with a sparse ADG can achieve global optimality.
arXiv Detail & Related papers (2025-06-01T02:58:20Z)
Offline Multi-agent Reinforcement Learning via Score Decomposition [51.23590397383217]
offline cooperative multi-agent reinforcement learning (MARL) faces unique challenges due to distributional shifts.<n>This work is the first work to explicitly address the distributional gap between offline and online MARL.
arXiv Detail & Related papers (2025-05-09T11:42:31Z)
Collab: Controlled Decoding using Mixture of Agents for LLM Alignment [90.6117569025754]
Reinforcement learning from human feedback has emerged as an effective technique to align Large Language models.<n>Controlled Decoding provides a mechanism for aligning a model at inference time without retraining.<n>We propose a mixture of agent-based decoding strategies leveraging the existing off-the-shelf aligned LLM policies.
arXiv Detail & Related papers (2025-03-27T17:34:25Z)
Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm. HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies. HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z)
Reaching Consensus in Cooperative Multi-Agent Reinforcement Learning with Goal Imagination [16.74629849552254]
We propose a model-based consensus mechanism to explicitly coordinate multiple agents. The proposed Multi-agent Goal Imagination (MAGI) framework guides agents to reach consensus with an Imagined common goal. We show that such efficient consensus mechanism can guide all agents cooperatively reaching valuable future states.
arXiv Detail & Related papers (2024-03-05T18:07:34Z)
Context-Aware Bayesian Network Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning [7.784991832712813]
We introduce a Bayesian network to inaugurate correlations between agents' action selections in their joint policy. We develop practical algorithms to learn the context-aware Bayesian network policies. Empirical results on a range of MARL benchmarks show the benefits of our approach.
arXiv Detail & Related papers (2023-06-02T21:22:27Z)
Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity. The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z)
Composable Energy Policies for Reactive Motion Generation and Reinforcement Learning [25.498555742173323]
We introduce Composable Energy Policies (CEP), a novel framework for modular motion generation. CEP computes the control action by optimization over the product of a set of reactive policies. CEP naturally adapts to the Reinforcement Learning problem allowing us to integrate, in a hierarchical fashion, any distribution as prior.
arXiv Detail & Related papers (2021-05-11T11:59:13Z)
FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC) It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
Multi-Agent Interactions Modeling with Correlated Policies [53.38338964628494]
In this paper, we cast the multi-agent interactions modeling problem into a multi-agent imitation learning framework. We develop a Decentralized Adrial Imitation Learning algorithm with Correlated policies (CoDAIL) Various experiments demonstrate that CoDAIL can better regenerate complex interactions close to the demonstrators.
arXiv Detail & Related papers (2020-01-04T17:31:53Z)

This list is automatically generated from the titles and abstracts of the papers in this site.