AIIR-MIX: Multi-Agent Reinforcement Learning Meets Attention Individual
Intrinsic Reward Mixing Network
- URL: http://arxiv.org/abs/2302.09531v1
- Date: Sun, 19 Feb 2023 10:25:25 GMT
- Title: AIIR-MIX: Multi-Agent Reinforcement Learning Meets Attention Individual
Intrinsic Reward Mixing Network
- Authors: Wei Li, Weiyan Liu, Shitong Shao, and Shiyi Huang
- Abstract summary: Deducing the contribution of each agent and assigning the corresponding reward to them is a crucial problem in cooperative Multi-Agent Reinforcement Learning (MARL)
Previous studies try to resolve the issue through designing an intrinsic reward function, but the intrinsic reward is simply combined with the environment reward by summation.
We propose Attention Individual Intrinsic Reward Mixing Network (AIIR-mix) in MARL.
- Score: 2.057898896648108
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deducing the contribution of each agent and assigning the corresponding
reward to them is a crucial problem in cooperative Multi-Agent Reinforcement
Learning (MARL). Previous studies try to resolve the issue through designing an
intrinsic reward function, but the intrinsic reward is simply combined with the
environment reward by summation in these studies, which makes the performance
of their MARL framework unsatisfactory. We propose a novel method named
Attention Individual Intrinsic Reward Mixing Network (AIIR-MIX) in MARL, and
the contributions of AIIR-MIX are listed as follows:(a) we construct a novel
intrinsic reward network based on the attention mechanism to make teamwork more
effective. (b) we propose a Mixing network that is able to combine intrinsic
and extrinsic rewards non-linearly and dynamically in response to changing
conditions of the environment. We compare AIIR-MIX with many State-Of-The-Art
(SOTA) MARL methods on battle games in StarCraft II. And the results
demonstrate that AIIR-MIX performs admirably and can defeat the current
advanced methods on average test win rate. To validate the effectiveness of
AIIR-MIX, we conduct additional ablation studies. The results show that
AIIR-MIX can dynamically assign each agent a real-time intrinsic reward in
accordance with their actual contribution.
Related papers
- Expeditious Saliency-guided Mix-up through Random Gradient Thresholding [89.59134648542042]
Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks.
In this paper, inspired by the superior qualities of each direction over one another, we introduce a novel method that lies at the junction of the two routes.
We name our method R-Mix following the concept of "Random Mix-up"
In order to address the question of whether there exists a better decision protocol, we train a Reinforcement Learning agent that decides the mix-up policies.
arXiv Detail & Related papers (2022-12-09T14:29:57Z) - Distributional Reward Estimation for Effective Multi-Agent Deep
Reinforcement Learning [19.788336796981685]
We propose a novel Distributional Reward Estimation framework for effective Multi-Agent Reinforcement Learning (DRE-MARL)
Our main idea is to design the multi-action-branch reward estimation and policy-weighted reward aggregation for stabilized training.
The superiority of the DRE-MARL is demonstrated using benchmark multi-agent scenarios, compared with the SOTA baselines in terms of both effectiveness and robustness.
arXiv Detail & Related papers (2022-10-14T08:31:45Z) - Learning with MISELBO: The Mixture Cookbook [62.75516608080322]
We present the first ever mixture of variational approximations for a normalizing flow-based hierarchical variational autoencoder (VAE) with VampPrior and a PixelCNN decoder network.
We explain this cooperative behavior by drawing a novel connection between VI and adaptive importance sampling.
We obtain state-of-the-art results among VAE architectures in terms of negative log-likelihood on the MNIST and FashionMNIST datasets.
arXiv Detail & Related papers (2022-09-30T15:01:35Z) - DQMIX: A Distributional Perspective on Multi-Agent Reinforcement
Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state.
Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z) - Revisiting QMIX: Discriminative Credit Assignment by Gradient Entropy
Regularization [126.87359177547455]
In cooperative multi-agent systems, agents jointly take actions and receive a team reward instead of individual rewards.
In the absence of individual reward signals, credit assignment mechanisms are usually introduced to discriminate the contributions of different agents.
We propose a new perspective on credit assignment measurement and empirically show that QMIX suffers limited discriminability on the assignment of credits to agents.
arXiv Detail & Related papers (2022-02-09T12:37:55Z) - MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition.
The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Balancing Rational and Other-Regarding Preferences in
Cooperative-Competitive Environments [4.705291741591329]
Mixed environments are notorious for the conflicts of selfish and social interests.
We propose BAROCCO to balance individual and social incentives.
Our meta-algorithm is compatible with both Q-learning and Actor-Critic frameworks.
arXiv Detail & Related papers (2021-02-24T14:35:32Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Energy-based Surprise Minimization for Multi-Agent Value Factorization [2.341806147715478]
We introduce the Energy-based MIXer (Emix), an algorithm which minimizes surprise utilizing the energy across agents.
Our contributions are threefold; EMIX introduces a novel surprise minimization technique across multiple agents.
Our ablation study highlights the necessity of the energy-based scheme and the need for elimination of overestimation bias in Multi-Agent Reinforcement Learning.
arXiv Detail & Related papers (2020-09-16T19:42:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.