A Unified Framework for Factorizing Distributional Value Functions for
Multi-Agent Reinforcement Learning
- URL: http://arxiv.org/abs/2306.02430v1
- Date: Sun, 4 Jun 2023 18:26:25 GMT
- Title: A Unified Framework for Factorizing Distributional Value Functions for
Multi-Agent Reinforcement Learning
- Authors: Wei-Fang Sun, Cheng-Kuang Lee, Simon See, and Chun-Yi Lee
- Abstract summary: We propose a unified framework, called DFAC, for integrating distributional RL with value function factorization methods.
This framework generalizes expected value function factorization methods to enable the factorization of return distributions.
- Score: 15.042567946390362
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In fully cooperative multi-agent reinforcement learning (MARL) settings,
environments are highly stochastic due to the partial observability of each
agent and the continuously changing policies of other agents. To address the
above issues, we proposed a unified framework, called DFAC, for integrating
distributional RL with value function factorization methods. This framework
generalizes expected value function factorization methods to enable the
factorization of return distributions. To validate DFAC, we first demonstrate
its ability to factorize the value functions of a simple matrix game with
stochastic rewards. Then, we perform experiments on all Super Hard maps of the
StarCraft Multi-Agent Challenge and six self-designed Ultra Hard maps, showing
that DFAC is able to outperform a number of baselines.
Related papers
- Routing to the Expert: Efficient Reward-guided Ensemble of Large
Language Models [69.51130760097818]
We propose Zooter, a reward-guided routing method distilling rewards on training queries to train a routing function.
We evaluate Zooter on a comprehensive benchmark collection with 26 subsets on different domains and tasks.
arXiv Detail & Related papers (2023-11-15T04:40:43Z) - QFree: A Universal Value Function Factorization for Multi-Agent
Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning.
We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z) - PAC: Assisted Value Factorisation with Counterfactual Predictions in
Multi-Agent Reinforcement Learning [43.862956745961654]
Multi-agent reinforcement learning (MARL) has witnessed significant progress with the development of value function factorization methods.
In this paper, we show that in partially observable MARL problems, an agent's ordering over its own actions could impose concurrent constraints.
We propose PAC, a new framework leveraging information generated from Counterfactual Predictions of optimal joint action selection.
arXiv Detail & Related papers (2022-06-22T23:34:30Z) - Value Functions Factorization with Latent State Information Sharing in
Decentralized Multi-Agent Policy Gradients [43.862956745961654]
LSF-SAC is a novel framework that features a variational inference-based information-sharing mechanism as extra state information.
We evaluate LSF-SAC on the StarCraft II micromanagement challenge and demonstrate that it outperforms several state-of-the-art methods in challenging collaborative tasks.
arXiv Detail & Related papers (2022-01-04T17:05:07Z) - Model based Multi-agent Reinforcement Learning with Tensor
Decompositions [52.575433758866936]
This paper investigates generalisation in state-action space over unexplored state-action pairs by modelling the transition and reward functions as tensors of low CP-rank.
Experiments on synthetic MDPs show that using tensor decompositions in a model-based reinforcement learning algorithm can lead to much faster convergence if the true transition and reward functions are indeed of low rank.
arXiv Detail & Related papers (2021-10-27T15:36:25Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - DFAC Framework: Factorizing the Value Function via Quantile Mixture for
Multi-Agent Distributional Q-Learning [7.893387199803367]
We propose a Distributional Value Function Factorization (DFAC) framework to generalize expected value function factorization methods.
DFAC extends the individual utility functions from deterministic variables to random variables, and models the quantile function of the total return as a quantile mixture.
We demonstrate DFAC's ability to factorize a simple two-step matrix game with rewards and perform experiments on all Super Hard tasks of StarCraft Multi-Agent Challenge.
arXiv Detail & Related papers (2021-02-16T03:16:49Z) - UPDeT: Universal Multi-agent Reinforcement Learning via Policy
Decoupling with Transformers [108.92194081987967]
We make the first attempt to explore a universal multi-agent reinforcement learning pipeline, designing one single architecture to fit tasks.
Unlike previous RNN-based models, we utilize a transformer-based model to generate a flexible policy.
The proposed model, named as Universal Policy Decoupling Transformer (UPDeT), further relaxes the action restriction and makes the multi-agent task's decision process more explainable.
arXiv Detail & Related papers (2021-01-20T07:24:24Z) - Randomized Entity-wise Factorization for Multi-Agent Reinforcement
Learning [59.62721526353915]
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities.
Our method aims to leverage these commonalities by asking the question: What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?''
arXiv Detail & Related papers (2020-06-07T18:28:41Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.