SHAQ: Incorporating Shapley Value Theory into Q-Learning for Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2105.15013v1
- Date: Mon, 31 May 2021 14:50:52 GMT
- Title: SHAQ: Incorporating Shapley Value Theory into Q-Learning for Multi-Agent
Reinforcement Learning
- Authors: Jianhong Wang, Jinxin Wang, Yuan Zhang, Yunjie Gu, Tae-Kyun Kim
- Abstract summary: We generalise the Shapley value in the coalitional game theory to a Markov convex game (MCG)
We show that the generalised Shapley value possesses several features such as accurate estimation of the maximum global value, (2) fairness in the factorisation of the global value, and (3) being sensitive to dummy agents.
The proposed theory yields a new learning algorithm called Sharpley Q-learning (SHAQ), which inherits the important merits of ordinary Q-learning but extends it to MARL.
- Score: 40.882696266783505
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Value factorisation proves to be a very useful technique in multi-agent
reinforcement learning (MARL), but the underlying mechanism is not yet fully
understood. This paper explores a theoretic basis for value factorisation. We
generalise the Shapley value in the coalitional game theory to a Markov convex
game (MCG) and use it to guide value factorisation in MARL. We show that the
generalised Shapley value possesses several features such as (1) accurate
estimation of the maximum global value, (2) fairness in the factorisation of
the global value, and (3) being sensitive to dummy agents. The proposed theory
yields a new learning algorithm called Sharpley Q-learning (SHAQ), which
inherits the important merits of ordinary Q-learning but extends it to MARL. In
comparison with prior-arts, SHAQ has a much weaker assumption (MCG) that is
more compatible with real-world problems, but has superior explainability and
performance in many cases. We demonstrated SHAQ and verified the theoretic
claims on Predator-Prey and StarCraft Multi-Agent Challenge (SMAC).
Related papers
- QuXAI: Explainers for Hybrid Quantum Machine Learning Models [1.0225653612678713]
This work introduces QuXAI, an explainer for explaining feature importance in hybrid machine learning systems.<n>Our model entails the creation of HQML models incorporating quantum feature maps, the use of Q-MEDLEY, which combines feature based inferences, preserving the quantum transformation stage and visualizing the resulting attributions.<n>Our result shows that Q-MEDLEY delineates influential classical aspects in HQML models, as well as separates their noise, and competes well against established XAI techniques.
arXiv Detail & Related papers (2025-05-15T10:51:34Z) - RSQ: Learning from Important Tokens Leads to Better Quantized LLMs [65.5558181902098]
Layer-wise quantization is a key technique for efficiently compressing large models without expensive retraining.
We propose RSQ (Rotate, Scale, then Quantize), which applies rotations to the model to mitigate outliers.
We demonstrate that RSQ consistently outperforms baseline methods across multiple downstream tasks and three model families.
arXiv Detail & Related papers (2025-03-03T18:46:33Z) - Balancing the Scales: A Theoretical and Algorithmic Framework for Learning from Imbalanced Data [35.03888101803088]
This paper introduces a novel theoretical framework for analyzing generalization in imbalanced classification.
We propose a new class-imbalanced margin loss function for both binary and multi-class settings, prove its strong $H$-consistency, and derive corresponding learning guarantees.
We devise novel and general learning algorithms, IMMAX, which incorporate confidence margins and are applicable to various hypothesis sets.
arXiv Detail & Related papers (2025-02-14T18:57:16Z) - Dual Ensembled Multiagent Q-Learning with Hypernet Regularizer [62.01554688056335]
Overestimation in the multiagent setting has received comparatively little attention.
We propose a novel hypernet regularizer on hypernetwork weights and biases to constrain the optimization of online global Q-network to prevent overestimation accumulation.
arXiv Detail & Related papers (2025-02-04T05:14:58Z) - Massive Values in Self-Attention Modules are the Key to Contextual Knowledge Understanding [58.364933651703524]
We show that concentrated massive values consistently emerge in specific regions of attention queries.
These massive values play a critical role in interpreting contextual knowledge.
We trace the emergence of massive values and find that such concentration is caused by Rotary Positional.
arXiv Detail & Related papers (2025-02-03T17:47:03Z) - Multi-Agent Reinforcement Learning from Human Feedback: Data Coverage and Algorithmic Techniques [65.55451717632317]
We study Multi-Agent Reinforcement Learning from Human Feedback (MARLHF), exploring both theoretical foundations and empirical validations.
We define the task as identifying Nash equilibrium from a preference-only offline dataset in general-sum games.
Our findings underscore the multifaceted approach required for MARLHF, paving the way for effective preference-based multi-agent systems.
arXiv Detail & Related papers (2024-09-01T13:14:41Z) - Shapley Value Based Multi-Agent Reinforcement Learning: Theory, Method
and Its Application to Energy Network [7.50196317304035]
This thesis investigates the foundation of credit assignment in multi-agent reinforcement learning via cooperative game theory.
We first extend a game model called convex game and a payoff distribution scheme called Shapley value in cooperative game theory.
Based on Markov Shapley value, we propose three multi-agent reinforcement learning algorithms called SHAQ, SQDDPG and SPO.
arXiv Detail & Related papers (2024-02-23T13:43:15Z) - Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation.
We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z) - QFree: A Universal Value Function Factorization for Multi-Agent
Reinforcement Learning [2.287186762346021]
We propose QFree, a universal value function factorization method for multi-agent reinforcement learning.
We show that QFree achieves the state-of-the-art performance in a general-purpose complex MARL benchmark environment.
arXiv Detail & Related papers (2023-11-01T08:07:16Z) - Maximum Entropy Heterogeneous-Agent Reinforcement Learning [47.652866966384586]
Multi-agent reinforcement learning (MARL) has been shown effective for cooperative games in recent years.
We propose a unified framework for learning emphstochastic policies to resolve these issues.
Based on the MaxEnt framework, we propose Heterogeneous-Agent Soft Actor-Critic (HASAC) algorithm.
arXiv Detail & Related papers (2023-06-19T06:22:02Z) - MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent
Reinforcement Learning [63.46052494151171]
We propose textitmulti-agent alternate Q-learning (MA2QL), where agents take turns to update their Q-functions by Q-learning.
We prove that when each agent guarantees a $varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium.
Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes.
arXiv Detail & Related papers (2022-09-17T04:54:32Z) - Collective eXplainable AI: Explaining Cooperative Strategies and Agent
Contribution in Multiagent Reinforcement Learning with Shapley Values [68.8204255655161]
This study proposes a novel approach to explain cooperative strategies in multiagent RL using Shapley values.
Results could have implications for non-discriminatory decision making, ethical and responsible AI-derived decisions or policy making under fairness constraints.
arXiv Detail & Related papers (2021-10-04T10:28:57Z) - Towards Understanding Cooperative Multi-Agent Q-Learning with Value
Factorization [28.89692989420673]
We formalize a multi-agent fitted Q-iteration framework for analyzing factorized multi-agent Q-learning.
Through further analysis, we find that on-policy training or richer joint value function classes can improve its local or global convergence properties.
arXiv Detail & Related papers (2020-05-31T19:14:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.