Energy-based Surprise Minimization for Multi-Agent Value Factorization
- URL: http://arxiv.org/abs/2009.09842v4
- Date: Mon, 18 Jan 2021 03:06:32 GMT
- Title: Energy-based Surprise Minimization for Multi-Agent Value Factorization
- Authors: Karush Suri, Xiao Qi Shi, Konstantinos Plataniotis, Yuri Lawryshyn
- Abstract summary: We introduce the Energy-based MIXer (Emix), an algorithm which minimizes surprise utilizing the energy across agents.
Our contributions are threefold; EMIX introduces a novel surprise minimization technique across multiple agents.
Our ablation study highlights the necessity of the energy-based scheme and the need for elimination of overestimation bias in Multi-Agent Reinforcement Learning.
- Score: 2.341806147715478
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-Agent Reinforcement Learning (MARL) has demonstrated significant
success in training decentralised policies in a centralised manner by making
use of value factorization methods. However, addressing surprise across
spurious states and approximation bias remain open problems for multi-agent
settings. Towards this goal, we introduce the Energy-based MIXer (EMIX), an
algorithm which minimizes surprise utilizing the energy across agents. Our
contributions are threefold; (1) EMIX introduces a novel surprise minimization
technique across multiple agents in the case of multi-agent
partially-observable settings. (2) EMIX highlights a practical use of energy
functions in MARL with theoretical guarantees and experiment validations of the
energy operator. Lastly, (3) EMIX extends Maxmin Q-learning for addressing
overestimation bias across agents in MARL. In a study of challenging StarCraft
II micromanagement scenarios, EMIX demonstrates consistent stable performance
for multiagent surprise minimization. Moreover, our ablation study highlights
the necessity of the energy-based scheme and the need for elimination of
overestimation bias in MARL. Our implementation of EMIX can be found at
karush17.github.io/emix-web/.
Related papers
- PowMix: A Versatile Regularizer for Multimodal Sentiment Analysis [71.8946280170493]
This paper introduces PowMix, a versatile embedding space regularizer that builds upon the strengths of unimodal mixing-based regularization approaches.
PowMix is integrated before the fusion stage of multimodal architectures and facilitates intra-modal mixing, such as mixing text with text, to act as a regularizer.
arXiv Detail & Related papers (2023-12-19T17:01:58Z) - AIIR-MIX: Multi-Agent Reinforcement Learning Meets Attention Individual
Intrinsic Reward Mixing Network [2.057898896648108]
Deducing the contribution of each agent and assigning the corresponding reward to them is a crucial problem in cooperative Multi-Agent Reinforcement Learning (MARL)
Previous studies try to resolve the issue through designing an intrinsic reward function, but the intrinsic reward is simply combined with the environment reward by summation.
We propose Attention Individual Intrinsic Reward Mixing Network (AIIR-mix) in MARL.
arXiv Detail & Related papers (2023-02-19T10:25:25Z) - Value Function Factorisation with Hypergraph Convolution for Cooperative
Multi-agent Reinforcement Learning [32.768661516953344]
We propose a method that combines hypergraph convolution with value decomposition.
By treating action values as signals, HGCN-Mix aims to explore the relationship between these signals via a self-learning hypergraph.
Experimental results present that HGCN-Mix matches or surpasses state-of-the-art techniques in the StarCraft II multi-agent challenge (SMAC) benchmark.
arXiv Detail & Related papers (2021-12-09T08:40:38Z) - MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition.
The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z) - Energy-Efficient and Federated Meta-Learning via Projected Stochastic
Gradient Ascent [79.58680275615752]
We propose an energy-efficient federated meta-learning framework.
We assume each task is owned by a separate agent, so a limited number of tasks is used to train a meta-model.
arXiv Detail & Related papers (2021-05-31T08:15:44Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z) - Risk-Aware Energy Scheduling for Edge Computing with Microgrid: A
Multi-Agent Deep Reinforcement Learning Approach [82.6692222294594]
We study a risk-aware energy scheduling problem for a microgrid-powered MEC network.
We derive the solution by applying a multi-agent deep reinforcement learning (MADRL)-based advantage actor-critic (A3C) algorithm with shared neural networks.
arXiv Detail & Related papers (2020-02-21T02:14:38Z) - Multi-Agent Meta-Reinforcement Learning for Self-Powered and Sustainable
Edge Computing Systems [87.4519172058185]
An effective energy dispatch mechanism for self-powered wireless networks with edge computing capabilities is studied.
A novel multi-agent meta-reinforcement learning (MAMRL) framework is proposed to solve the formulated problem.
Experimental results show that the proposed MAMRL model can reduce up to 11% non-renewable energy usage and by 22.4% the energy cost.
arXiv Detail & Related papers (2020-02-20T04:58:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.