Related papers: Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization

Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization

URL: http://arxiv.org/abs/2406.13930v1
Date: Thu, 20 Jun 2024 01:55:08 GMT
Title: Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization
Authors: Wentse Chen, Shiyu Huang, Jeff Schneider,
Abstract summary: We propose an enhancement to QMIX by incorporating an additional local Qvalue learning method within the maximum entropy RL framework. Our approach constrains the local Q-value estimates to maintain the correct ordering of all actions. We theoretically prove the monotonic improvement and convergence of our method to an optimal solution.
Score: 5.54284350152423
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Multi-agent reinforcement learning (MARL) tasks often utilize a centralized training with decentralized execution (CTDE) framework. QMIX is a successful CTDE method that learns a credit assignment function to derive local value functions from a global value function, defining a deterministic local policy. However, QMIX is hindered by its poor exploration strategy. While maximum entropy reinforcement learning (RL) promotes better exploration through stochastic policies, QMIX's process of credit assignment conflicts with the maximum entropy objective and the decentralized execution requirement, making it unsuitable for maximum entropy RL. In this paper, we propose an enhancement to QMIX by incorporating an additional local Q-value learning method within the maximum entropy RL framework. Our approach constrains the local Q-value estimates to maintain the correct ordering of all actions. Due to the monotonicity of the QMIX value function, these updates ensure that locally optimal actions align with globally optimal actions. We theoretically prove the monotonic improvement and convergence of our method to an optimal solution. Experimentally, we validate our algorithm in matrix games, Multi-Agent Particle Environment and demonstrate state-of-the-art performance in SMAC-v2.

Related papers

Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning [49.31650627835956]
Partial agent failure becomes inevitable when systems scale up, making it crucial to identify the subset of agents whose compromise would most severely degrade overall performance.<n>In this paper, we study this Vulnerable Agent Identification (VAI) problem in large-scale multi-agent reinforcement learning (MARL)<n> Experiments show our method effectively identifies more vulnerable agents in large-scale MARL and the rule-based system, fooling system into worse failures, and learning a value function that reveals the vulnerability of each agent.
arXiv Detail & Related papers (2025-09-18T16:03:50Z)
Hyper-modal Imputation Diffusion Embedding with Dual-Distillation for Federated Multimodal Knowledge Graph Completion [59.54067771781552]
We propose a framework named MMFeD3-HidE for addressing multimodal uncertain unavailability and multimodal client heterogeneity challenges of FedMKGC.<n>We propose a FedMKGC benchmark for a comprehensive evaluation, consisting of a general FedMKGC backbone named MMFedE, datasets with heterogeneous multimodal information, and three groups of constructed baselines.
arXiv Detail & Related papers (2025-06-27T09:32:58Z)
Fixing Incomplete Value Function Decomposition for Multi-Agent Reinforcement Learning [14.664083077278002]
We present a novel family of value function decomposition models that expand the representation capabilities of prior models by means of a thin "fixing" layer.<n>We derive multiple variants of QFIX, and implement three variants in two well-known multi-agent frameworks.
arXiv Detail & Related papers (2025-05-15T16:36:18Z)
Stochastic Q-learning for Large Discrete Action Spaces [79.1700188160944]
In complex environments with discrete action spaces, effective decision-making is critical in reinforcement learning (RL) We present value-based RL approaches which, as opposed to optimizing over the entire set of $n$ actions, only consider a variable set of actions, possibly as small as $mathcalO(log(n)$)$. The presented value-based RL methods include, among others, Q-learning, StochDQN, StochDDQN, all of which integrate this approach for both value-function updates and action selection.
arXiv Detail & Related papers (2024-05-16T17:58:44Z)
QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning. We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z)
Maximum Entropy Heterogeneous-Agent Reinforcement Learning [45.377385280485065]
Multi-agent reinforcement learning (MARL) has been shown effective for cooperative games in recent years.<n>We propose a unified framework for learning policies to resolve issues related to sample complexity, training instability, and the risk of converging to a suboptimal Nash Equilibrium.<n>Based on the MaxEnt framework, we propose Heterogeneous-Agent Soft Actor-Critic (HASAC) algorithm.<n>We evaluate HASAC on six benchmarks: Bi-DexHands, Multi-Agent MuJoCo, StarCraft Challenge, Google Research Football, Multi-Agent Particle Environment, and Light Aircraft Game.
arXiv Detail & Related papers (2023-06-19T06:22:02Z)
Ensemble Value Functions for Efficient Exploration in Multi-Agent Reinforcement Learning [18.762198598488066]
Multi-agent reinforcement learning (MARL) requires agents to explore within a vast joint action space.<n>EMAX is a framework to seamlessly extend value-based MARL algorithms.
arXiv Detail & Related papers (2023-02-07T12:51:20Z)
Dual Self-Awareness Value Decomposition Framework without Individual Global Max for Cooperative Multi-Agent Reinforcement Learning [12.74348597962689]
We propose a dual self-awareness value decomposition framework, inspired by the notion of dual self-awareness in psychology. As the first fully IGM-free value decomposition method, our proposed framework achieves desirable performance in various cooperative tasks.
arXiv Detail & Related papers (2023-02-04T15:13:20Z)
Expeditious Saliency-guided Mix-up through Random Gradient Thresholding [89.59134648542042]
Mix-up training approaches have proven to be effective in improving the generalization ability of Deep Neural Networks. In this paper, inspired by the superior qualities of each direction over one another, we introduce a novel method that lies at the junction of the two routes. We name our method R-Mix following the concept of "Random Mix-up" In order to address the question of whether there exists a better decision protocol, we train a Reinforcement Learning agent that decides the mix-up policies.
arXiv Detail & Related papers (2022-12-09T14:29:57Z)
Addressing the issue of stochastic environments and local decision-making in multi-objective reinforcement learning [0.0]
Multi-objective reinforcement learning (MORL) is a relatively new field which builds on conventional Reinforcement Learning (RL) This thesis focuses on what factors influence the frequency with which value-based MORL Q-learning algorithms learn the optimal policy for an environment.
arXiv Detail & Related papers (2022-11-16T04:56:42Z)
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games. We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method. Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z)
DQMIX: A Distributional Perspective on Multi-Agent Reinforcement Learning [122.47938710284784]
In cooperative multi-agent tasks, a team of agents jointly interact with an environment by taking actions, receiving a reward and observing the next state. Most of the existing value-based multi-agent reinforcement learning methods only model the expectations of individual Q-values and global Q-value.
arXiv Detail & Related papers (2022-02-21T11:28:00Z)
Value Functions Factorization with Latent State Information Sharing in Decentralized Multi-Agent Policy Gradients [43.862956745961654]
LSF-SAC is a novel framework that features a variational inference-based information-sharing mechanism as extra state information. We evaluate LSF-SAC on the StarCraft II micromanagement challenge and demonstrate that it outperforms several state-of-the-art methods in challenging collaborative tasks.
arXiv Detail & Related papers (2022-01-04T17:05:07Z)
MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning [15.972363414919279]
MMD-mix is a method that combines distributional reinforcement learning and value decomposition. The experiments demonstrate that MMD-mix outperforms prior baselines in the Star Multi-Agent Challenge (SMAC) environment.
arXiv Detail & Related papers (2021-06-22T10:21:00Z)
Permutation Invariant Policy Optimization for Mean-Field Multi-Agent Reinforcement Learning: A Principled Approach [128.62787284435007]
We propose the mean-field proximal policy optimization (MF-PPO) algorithm, at the core of which is a permutation-invariant actor-critic neural architecture. We prove that MF-PPO attains the globally optimal policy at a sublinear rate of convergence. In particular, we show that the inductive bias introduced by the permutation-invariant neural architecture enables MF-PPO to outperform existing competitors.
arXiv Detail & Related papers (2021-05-18T04:35:41Z)
Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning [66.94149388181343]
We present a new version of a popular $Q$-learning algorithm for MARL. We show that it can recover the optimal policy even with access to $Q*$. We also demonstrate improved performance on predator-prey and challenging multi-agent StarCraft benchmark tasks.
arXiv Detail & Related papers (2020-06-18T18:34:50Z)
Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning [55.20040781688844]
QMIX is a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. We propose the StarCraft Multi-Agent Challenge (SMAC) as a new benchmark for deep multi-agent reinforcement learning.
arXiv Detail & Related papers (2020-03-19T16:51:51Z)
FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC) It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces. We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.