Related papers: SHAQ: Incorporating Shapley Value Theory into Q-Learning for Multi-Agent Reinforcement Learning

SHAQ: Incorporating Shapley Value Theory into Q-Learning for Multi-Agent Reinforcement Learning

URL: http://arxiv.org/abs/2105.15013v1
Date: Mon, 31 May 2021 14:50:52 GMT
Title: SHAQ: Incorporating Shapley Value Theory into Q-Learning for Multi-Agent Reinforcement Learning
Authors: Jianhong Wang, Jinxin Wang, Yuan Zhang, Yunjie Gu, Tae-Kyun Kim
Abstract summary: We generalise the Shapley value in the coalitional game theory to a Markov convex game (MCG) We show that the generalised Shapley value possesses several features such as accurate estimation of the maximum global value, (2) fairness in the factorisation of the global value, and (3) being sensitive to dummy agents. The proposed theory yields a new learning algorithm called Sharpley Q-learning (SHAQ), which inherits the important merits of ordinary Q-learning but extends it to MARL.
Score: 40.882696266783505
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Value factorisation proves to be a very useful technique in multi-agent reinforcement learning (MARL), but the underlying mechanism is not yet fully understood. This paper explores a theoretic basis for value factorisation. We generalise the Shapley value in the coalitional game theory to a Markov convex game (MCG) and use it to guide value factorisation in MARL. We show that the generalised Shapley value possesses several features such as (1) accurate estimation of the maximum global value, (2) fairness in the factorisation of the global value, and (3) being sensitive to dummy agents. The proposed theory yields a new learning algorithm called Sharpley Q-learning (SHAQ), which inherits the important merits of ordinary Q-learning but extends it to MARL. In comparison with prior-arts, SHAQ has a much weaker assumption (MCG) that is more compatible with real-world problems, but has superior explainability and performance in many cases. We demonstrated SHAQ and verified the theoretic claims on Predator-Prey and StarCraft Multi-Agent Challenge (SMAC).

Related papers

QuXAI: Explainers for Hybrid Quantum Machine Learning Models [1.0225653612678713]
This work introduces QuXAI, an explainer for explaining feature importance in hybrid machine learning systems.<n>Our model entails the creation of HQML models incorporating quantum feature maps, the use of Q-MEDLEY, which combines feature based inferences, preserving the quantum transformation stage and visualizing the resulting attributions.<n>Our result shows that Q-MEDLEY delineates influential classical aspects in HQML models, as well as separates their noise, and competes well against established XAI techniques.
arXiv Detail & Related papers (2025-05-15T10:51:34Z)
RSQ: Learning from Important Tokens Leads to Better Quantized LLMs [65.5558181902098]
Layer-wise quantization is a key technique for efficiently compressing large models without expensive retraining. We propose RSQ (Rotate, Scale, then Quantize), which applies rotations to the model to mitigate outliers. We demonstrate that RSQ consistently outperforms baseline methods across multiple downstream tasks and three model families.
arXiv Detail & Related papers (2025-03-03T18:46:33Z)
Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data [35.03888101803088]
This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification. We propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong $H$-consistency, and derive corresponding learning guarantees. We devise novel and general learning algorithms, IMMAX, which incorporate confidence margins and are applicable to various hypothesis sets.
arXiv Detail & Related papers (2025-02-14T18:57:16Z)
Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer [62.01554688056335]
Overestimation in the multiagent setting has received comparatively little attention. We propose a novel hypernet regularizer on hypernetwork weights and biases to constrain the optimization of online global Q-network to prevent overestimation accumulation.
arXiv Detail & Related papers (2025-02-04T05:14:58Z)
Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding [58.364933651703524]
We show that concentrated massive values consistently emerge in specific regions of attention queries. These massive values play a critical role in interpreting contextual knowledge. We trace the emergence of massive values and find that such concentration is caused by Rotary Positional.
arXiv Detail & Related papers (2025-02-03T17:47:03Z)
Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations. We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games. Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z)
Shapley Value Based Multi-Agent Reinforcement Learning: Theory, Method and Its Application to Energy Network [7.50196317304035]
This thesis investigates the foundation of credit assignment in multi-agent reinforcement learning via cooperative game theory. We first extend a game model called convex game and a payoff distribution scheme called Shapley value in cooperative game theory. Based on Markov Shapley value, we propose three multi-agent reinforcement learning algorithms called SHAQ, SQDDPG and SPO.
arXiv Detail & Related papers (2024-02-23T13:43:15Z)
Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation. We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z)
QFree: A Universal Value Function Factorization for Multi-Agent Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning. We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z)
Maximum Entropy Heterogeneous-Agent Reinforcement Learning [47.652866966384586]
Multi-agent reinforcement learning (MARL) has been shown effective for cooperative games in recent years. We propose a unified framework for learning emphstochastic policies to resolve these issues. Based on the MaxEnt framework, we propose Heterogeneous-Agent Soft Actor-Critic (HASAC) algorithm.
arXiv Detail & Related papers (2023-06-19T06:22:02Z)
MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent Reinforcement Learning [63.46052494151171]
We propose textitmulti-agent alternate Q-learning (MA2QL), where agents take turns to update their Q-functions by Q-learning. We prove that when each agent guarantees a $varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium. Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes.
arXiv Detail & Related papers (2022-09-17T04:54:32Z)
Collective eXplainable AI: Explaining Cooperative Strategies and Agent Contribution in Multiagent Reinforcement Learning with Shapley Values [68.8204255655161]
This study proposes a novel approach to explain cooperative strategies in multiagent RL using Shapley values. Results could have implications for non-discriminatory decision making, ethical and responsible AI-derived decisions or policy making under fairness constraints.
arXiv Detail & Related papers (2021-10-04T10:28:57Z)
Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization [28.89692989420673]
We formalize a multi-agent fitted Q-iteration framework for analyzing factorized multi-agent Q-learning. Through further analysis, we find that on-policy training or richer joint value function classes can improve its local or global convergence properties.
arXiv Detail & Related papers (2020-05-31T19:14:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.