DFAC Framework: Factorizing the Value Function via Quantile Mixture for
Multi-Agent Distributional Q-Learning
- URL: http://arxiv.org/abs/2102.07936v1
- Date: Tue, 16 Feb 2021 03:16:49 GMT
- Title: DFAC Framework: Factorizing the Value Function via Quantile Mixture for
Multi-Agent Distributional Q-Learning
- Authors: Wei-Fang Sun, Cheng-Kuang Lee, Chun-Yi Lee
- Abstract summary: We propose a Distributional Value Function Factorization (DFAC) framework to generalize expected value function factorization methods.
DFAC extends the individual utility functions from deterministic variables to random variables, and models the quantile function of the total return as a quantile mixture.
We demonstrate DFAC's ability to factorize a simple two-step matrix game with rewards and perform experiments on all Super Hard tasks of StarCraft Multi-Agent Challenge.
- Score: 7.893387199803367
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In fully cooperative multi-agent reinforcement learning (MARL) settings, the
environments are highly stochastic due to the partial observability of each
agent and the continuously changing policies of the other agents. To address
the above issues, we integrate distributional RL and value function
factorization methods by proposing a Distributional Value Function
Factorization (DFAC) framework to generalize expected value function
factorization methods to their DFAC variants. DFAC extends the individual
utility functions from deterministic variables to random variables, and models
the quantile function of the total return as a quantile mixture. To validate
DFAC, we demonstrate DFAC's ability to factorize a simple two-step matrix game
with stochastic rewards and perform experiments on all Super Hard tasks of
StarCraft Multi-Agent Challenge, showing that DFAC is able to outperform
expected value function factorization baselines.
Related papers
- A Unified Framework for Factorizing Distributional Value Functions for
Multi-Agent Reinforcement Learning [15.042567946390362]
We propose a unified framework, called DFAC, for integrating distributional RL with value function factorization methods.
This framework generalizes expected value function factorization methods to enable the factorization of return distributions.
arXiv Detail & Related papers (2023-06-04T18:26:25Z) - Relational Reasoning via Set Transformers: Provable Efficiency and
Applications to MARL [154.13105285663656]
A cooperative Multi-A gent R einforcement Learning (MARL) with permutation invariant agents framework has achieved tremendous empirical successes in real-world applications.
Unfortunately, the theoretical understanding of this MARL problem is lacking due to the curse of many agents and the limited exploration of the relational reasoning in existing works.
We prove that the suboptimality gaps of the model-free and model-based algorithms are independent of and logarithmic in the number of agents respectively, which mitigates the curse of many agents.
arXiv Detail & Related papers (2022-09-20T16:42:59Z) - PAC: Assisted Value Factorisation with Counterfactual Predictions in
Multi-Agent Reinforcement Learning [43.862956745961654]
Multi-agent reinforcement learning (MARL) has witnessed significant progress with the development of value function factorization methods.
In this paper, we show that in partially observable MARL problems, an agent's ordering over its own actions could impose concurrent constraints.
We propose PAC, a new framework leveraging information generated from Counterfactual Predictions of optimal joint action selection.
arXiv Detail & Related papers (2022-06-22T23:34:30Z) - Policy Diagnosis via Measuring Role Diversity in Cooperative Multi-agent
RL [107.58821842920393]
We quantify the agent's behavior difference and build its relationship with the policy performance via bf Role Diversity
We find that the error bound in MARL can be decomposed into three parts that have a strong relation to the role diversity.
The decomposed factors can significantly impact policy optimization on three popular directions.
arXiv Detail & Related papers (2022-06-01T04:58:52Z) - DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state.
Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z) - Permutation Invariant Policy Optimization for Mean-Field Multi-Agent
Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture.
We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence.
In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z) - Randomized Entity-wise Factorization for Multi-Agent Reinforcement
Learning [59.62721526353915]
Multi-agent settings in the real world often involve tasks with varying types and quantities of agents and non-agent entities.
Our method aims to leverage these commonalities by asking the question: What is the expected utility of each agent when only considering a randomly selected sub-group of its observed entities?''
arXiv Detail & Related papers (2020-06-07T18:28:41Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z) - Generalized Hidden Parameter MDPs Transferable Model-based RL in a
Handful of Trials [13.051708608864539]
Generalized Hidden MDPs (GHP-MDPs) describe a family of MDPs where both dynamics and reward can change as a function of hidden parameters that vary across tasks.
We experimentally demonstrate state-of-the-art performance and sample-efficiency on a new challenging MuJoCo task using reward and dynamics latent spaces.
arXiv Detail & Related papers (2020-02-08T02:49:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.