PAC: Assisted Value Factorisation with Counterfactual Predictions in
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2206.11420v1
- Date: Wed, 22 Jun 2022 23:34:30 GMT
- Title: PAC: Assisted Value Factorisation with Counterfactual Predictions in
Multi-Agent Reinforcement Learning
- Authors: Hanhan Zhou, Tian Lan, Vaneet Aggarwal
- Abstract summary: Multi-agent reinforcement learning (MARL) has witnessed significant progress with the development of value function factorization methods.
In this paper, we show that in partially observable MARL problems, an agent's ordering over its own actions could impose concurrent constraints.
We propose PAC, a new framework leveraging information generated from Counterfactual Predictions of optimal joint action selection.
- Score: 43.862956745961654
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-agent reinforcement learning (MARL) has witnessed significant progress
with the development of value function factorization methods. It allows
optimizing a joint action-value function through the maximization of factorized
per-agent utilities due to monotonicity. In this paper, we show that in
partially observable MARL problems, an agent's ordering over its own actions
could impose concurrent constraints (across different states) on the
representable function class, causing significant estimation error during
training. We tackle this limitation and propose PAC, a new framework leveraging
Assistive information generated from Counterfactual Predictions of optimal
joint action selection, which enable explicit assistance to value function
factorization through a novel counterfactual loss. A variational
inference-based information encoding method is developed to collect and encode
the counterfactual predictions from an estimated baseline. To enable
decentralized execution, we also derive factorized per-agent policies inspired
by a maximum-entropy MARL framework. We evaluate the proposed PAC on
multi-agent predator-prey and a set of StarCraft II micromanagement tasks.
Empirical results demonstrate improved results of PAC over state-of-the-art
value-based and policy-based multi-agent reinforcement learning algorithms on
all benchmarks.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - QFree: A Universal Value Function Factorization for Multi-Agent
Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning.
We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z) - A Unified Framework for Factorizing Distributional Value Functions for
Multi-Agent Reinforcement Learning [15.042567946390362]
We propose a unified framework, called DFAC, for integrating distributional RL with value function factorization methods.
This framework generalizes expected value function factorization methods to enable the factorization of return distributions.
arXiv Detail & Related papers (2023-06-04T18:26:25Z) - RACA: Relation-Aware Credit Assignment for Ad-Hoc Cooperation in
Multi-Agent Deep Reinforcement Learning [55.55009081609396]
We propose a novel method, called Relation-Aware Credit Assignment (RACA), which achieves zero-shot generalization in ad-hoc cooperation scenarios.
RACA takes advantage of a graph-based encoder relation to encode the topological structure between agents.
Our method outperforms baseline methods on the StarCraftII micromanagement benchmark and ad-hoc cooperation scenarios.
arXiv Detail & Related papers (2022-06-02T03:39:27Z) - Value Functions Factorization with Latent State Information Sharing in
Decentralized Multi-Agent Policy Gradients [43.862956745961654]
LSF-SAC is a novel framework that features a variational inference-based information-sharing mechanism as extra state information.
We evaluate LSF-SAC on the StarCraft II micromanagement challenge and demonstrate that it outperforms several state-of-the-art methods in challenging collaborative tasks.
arXiv Detail & Related papers (2022-01-04T17:05:07Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Modeling the Interaction between Agents in Cooperative Multi-Agent
Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC)
IAC models the interaction of agents from perspectives of policy and value function.
We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z) - Multi-Agent Determinantal Q-Learning [39.79718674655209]
We propose multi-agent determinantal Q-learning. Q-DPP promotes agents to acquire diverse behavioral models.
We demonstrate that Q-DPP generalizes major solutions including VDN, QMIX, and QTRAN on decentralizable cooperative tasks.
arXiv Detail & Related papers (2020-06-02T09:32:48Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.