MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2209.08244v1
- Date: Sat, 17 Sep 2022 04:54:32 GMT
- Title: MA2QL: A Minimalist Approach to Fully Decentralized Multi-Agent
Reinforcement Learning
- Authors: Kefan Su, Siyuan Zhou, Chuang Gan, Xiangjun Wang, Zongqing Lu
- Abstract summary: We propose textitmulti-agent alternate Q-learning (MA2QL), where agents take turns to update their Q-functions by Q-learning.
We prove that when each agent guarantees a $varepsilon$-convergence at each turn, their joint policy converges to a Nash equilibrium.
Results show MA2QL consistently outperforms IQL, which verifies the effectiveness of MA2QL, despite such minimal changes.
- Score: 63.46052494151171
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decentralized learning has shown great promise for cooperative multi-agent
reinforcement learning (MARL). However, non-stationarity remains a significant
challenge in decentralized learning. In the paper, we tackle the
non-stationarity problem in the simplest and fundamental way and propose
\textit{multi-agent alternate Q-learning} (MA2QL), where agents take turns to
update their Q-functions by Q-learning. MA2QL is a \textit{minimalist} approach
to fully decentralized cooperative MARL but is theoretically grounded. We prove
that when each agent guarantees a $\varepsilon$-convergence at each turn, their
joint policy converges to a Nash equilibrium. In practice, MA2QL only requires
minimal changes to independent Q-learning (IQL). We empirically evaluate MA2QL
on a variety of cooperative multi-agent tasks. Results show MA2QL consistently
outperforms IQL, which verifies the effectiveness of MA2QL, despite such
minimal changes.
Related papers
- MFC-EQ: Mean-Field Control with Envelope Q-Learning for Moving Decentralized Agents in Formation [1.770056709115081]
Moving Agents in Formation (MAiF) is a variant of Multi-Agent Path Finding.
MFC-EQ is a scalable and adaptable learning framework for this bi-objective multi-agent problem.
arXiv Detail & Related papers (2024-10-15T20:59:47Z) - Multi-agent Reinforcement Learning with Deep Networks for Diverse Q-Vectors [3.9801926395657325]
This paper proposes a deep Q-networks (DQN) algorithm capable of learning various Q-vectors using Max, Nash, and Maximin strategies.
The effectiveness of this approach is demonstrated in an environment where dual robotic arms collaborate to lift a pot.
arXiv Detail & Related papers (2024-06-12T03:30:10Z) - TransfQMix: Transformers for Leveraging the Graph Structure of
Multi-Agent Reinforcement Learning Problems [0.0]
We present TransfQMix, a new approach that uses transformers to leverage a latent graph structure and learn better coordination policies.
Our transformer Q-mixer learns a monotonic mixing-function from a larger graph that includes the internal and external states of the agents.
We report TransfQMix's performances in the Spread and StarCraft II environments.
arXiv Detail & Related papers (2023-01-13T00:07:08Z) - Efficient Model-based Multi-agent Reinforcement Learning via Optimistic
Equilibrium Computation [93.52573037053449]
H-MARL (Hallucinated Multi-Agent Reinforcement Learning) learns successful equilibrium policies after a few interactions with the environment.
We demonstrate our approach experimentally on an autonomous driving simulation benchmark.
arXiv Detail & Related papers (2022-03-14T17:24:03Z) - Q-Learning Lagrange Policies for Multi-Action Restless Bandits [35.022322303796216]
Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which $N$ independent processes are managed.
We design the first algorithms for learning good policies for Multi-action RMABs online using combinations of Lagrangian relaxation and Q-learning.
arXiv Detail & Related papers (2021-06-22T19:20:09Z) - Softmax with Regularization: Better Value Estimation in Multi-Agent
Reinforcement Learning [72.28520951105207]
Overestimation in $Q$-learning is an important problem that has been extensively studied in single-agent reinforcement learning.
We propose a novel regularization-based update scheme that penalizes large joint action-values deviating from a baseline.
We show that our method provides a consistent performance improvement on a set of challenging StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2021-03-22T14:18:39Z) - Single-partition adaptive Q-learning [0.0]
Single- Partition adaptive Q-learning (SPAQL) is an algorithm for model-free episodic reinforcement learning.
Tests on episodes with a large number of time steps show that SPAQL has no problems scaling, unlike adaptive Q-learning (AQL)
We claim that SPAQL may have a higher sample efficiency than AQL, thus being a relevant contribution to the field of efficient model-free RL methods.
arXiv Detail & Related papers (2020-07-14T00:03:25Z) - QTRAN++: Improved Value Transformation for Cooperative Multi-Agent
Reinforcement Learning [70.382101956278]
QTRAN is a reinforcement learning algorithm capable of learning the largest class of joint-action value functions.
Despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments.
We propose a substantially improved version, coined QTRAN++.
arXiv Detail & Related papers (2020-06-22T05:08:36Z) - Weighted QMIX: Expanding Monotonic Value Function Factorisation for Deep
Multi-Agent Reinforcement Learning [66.94149388181343]
We present a new version of a popular $Q$-learning algorithm for MARL.
We show that it can recover the optimal policy even with access to $Q*$.
We also demonstrate improved performance on predator-prey and challenging multi-agent StarCraft benchmark tasks.
arXiv Detail & Related papers (2020-06-18T18:34:50Z) - FACMAC: Factored Multi-Agent Centralised Policy Gradients [103.30380537282517]
We propose FACtored Multi-Agent Centralised policy gradients (FACMAC)
It is a new method for cooperative multi-agent reinforcement learning in both discrete and continuous action spaces.
We evaluate FACMAC on variants of the multi-agent particle environments, a novel multi-agent MuJoCo benchmark, and a challenging set of StarCraft II micromanagement tasks.
arXiv Detail & Related papers (2020-03-14T21:29:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.