Maximum Correntropy Value Decomposition for Multi-agent Deep
Reinforcemen Learning
- URL: http://arxiv.org/abs/2208.03663v1
- Date: Sun, 7 Aug 2022 08:06:21 GMT
- Title: Maximum Correntropy Value Decomposition for Multi-agent Deep
Reinforcemen Learning
- Authors: Kai Liu, Tianxian Zhang, Lingjiang Kong
- Abstract summary: We introduce the Maximum Correntropy Criterion (MCC) as a cost function to dynamically adapt the weight to eliminate the effects of minimum in reward distributions.
A preliminary experiment conducted on OMG shows that MCVD could deal with non-monotonic value decomposition problems with a large tolerance of kernel bandwidth selection.
- Score: 4.743243072814404
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore value decomposition solutions for multi-agent deep reinforcement
learning in the popular paradigm of centralized training with decentralized
execution(CTDE). As the recognized best solution to CTDE, Weighted QMIX is
cutting-edge on StarCraft Multi-agent Challenge (SMAC), with a weighting scheme
implemented on QMIX to place more emphasis on the optimal joint actions.
However, the fixed weight requires manual tuning according to the application
scenarios, which painfully prevents Weighted QMIX from being used in broader
engineering applications. In this paper, we first demonstrate the flaw of
Weighted QMIX using an ordinary One-Step Matrix Game (OMG), that no matter how
the weight is chosen, Weighted QMIX struggles to deal with non-monotonic value
decomposition problems with a large variance of reward distributions. Then we
characterize the problem of value decomposition as an Underfitting One-edged
Robust Regression problem and make the first attempt to give a solution to the
value decomposition problem from the perspective of information-theoretical
learning. We introduce the Maximum Correntropy Criterion (MCC) as a cost
function to dynamically adapt the weight to eliminate the effects of minimum in
reward distributions. We simplify the implementation and propose a new
algorithm called MCVD. A preliminary experiment conducted on OMG shows that
MCVD could deal with non-monotonic value decomposition problems with a large
tolerance of kernel bandwidth selection. Further experiments are carried out on
Cooperative-Navigation and multiple SMAC scenarios, where MCVD exhibits
unprecedented ease of implementation, broad applicability, and stability.
Related papers
- MG-Net: Learn to Customize QAOA with Circuit Depth Awareness [51.78425545377329]
Quantum Approximate Optimization Algorithm (QAOA) and its variants exhibit immense potential in tackling optimization challenges.
The requisite circuit depth for satisfactory performance is problem-specific and often exceeds the maximum capability of current quantum devices.
We introduce the Mixer Generator Network (MG-Net), a unified deep learning framework adept at dynamically formulating optimal mixer Hamiltonians.
arXiv Detail & Related papers (2024-09-27T12:28:18Z) - POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning [17.644279061872442]
Value function factorization methods are commonly used in cooperative multi-agent reinforcement learning.
We propose the Potentially Optimal Joint Actions Weighted Qmix (POWQmix) algorithm, which recognizes the potentially optimal joint actions and assigns higher weights to the corresponding losses during training.
Experiments in matrix games, difficulty-enhanced predator-prey, and StarCraft II Multi-Agent Challenge environments demonstrate that our algorithm outperforms the state-of-the-art value-based multi-agent reinforcement learning methods.
arXiv Detail & Related papers (2024-05-13T03:27:35Z) - Fast Semisupervised Unmixing Using Nonconvex Optimization [80.11512905623417]
We introduce a novel convex convex model for semi/library-based unmixing.
We demonstrate the efficacy of Alternating Methods of sparse unsupervised unmixing.
arXiv Detail & Related papers (2024-01-23T10:07:41Z) - Gaussian Mixture Solvers for Diffusion Models [84.83349474361204]
We introduce a novel class of SDE-based solvers called GMS for diffusion models.
Our solver outperforms numerous SDE-based solvers in terms of sample quality in image generation and stroke-based synthesis.
arXiv Detail & Related papers (2023-11-02T02:05:38Z) - Interfacing Finite Elements with Deep Neural Operators for Fast
Multiscale Modeling of Mechanics Problems [4.280301926296439]
In this work, we explore the idea of multiscale modeling with machine learning and employ DeepONet, a neural operator, as an efficient surrogate of the expensive solver.
DeepONet is trained offline using data acquired from the fine solver for learning the underlying and possibly unknown fine-scale dynamics.
We present various benchmarks to assess accuracy and speedup, and in particular we develop a coupling algorithm for a time-dependent problem.
arXiv Detail & Related papers (2022-02-25T20:46:08Z) - MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for
Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition.
The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - QR-MIX: Distributional Value Function Factorisation for Cooperative
Multi-Agent Reinforcement Learning [5.564793925574797]
In cooperative multi-Agent Reinforcement Learning (MARL), agents observe and interact with their environment locally and independently.
With local observation and random sampling, the randomness in rewards and observations leads to randomness in long-term returns.
Existing methods such as Value Decomposition Network (VDN) and QMIX estimate the value of long-term returns as a scalar that does not contain the information of randomness.
arXiv Detail & Related papers (2020-09-09T10:28:44Z) - Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep
Multi-Agent Reinforcement Learning [66.94149388181343]
We present a new version of a popular $Q$-learning algorithm for MARL.
We show that it can recover the optimal policy even with access to $Q*$.
We also demonstrate improved performance on predator-prey and challenging multi-agent StarCraft benchmark tasks.
arXiv Detail & Related papers (2020-06-18T18:34:50Z) - Monotonic Value Function Factorisation for Deep Multi-Agent
Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion.
We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.