Related papers: Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis

Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis

URL: http://arxiv.org/abs/2002.04131v6
Date: Fri, 1 Oct 2021 17:29:49 GMT
Title: Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis
Authors: Haotian Gu, Xin Guo, Xiaoli Wei, Renyuan Xu
Abstract summary: This paper builds the mathematical framework to approximate cooperative MARL by a mean-field control (MFC) approach. It proposes a model-free kernel-based Q-learning algorithm (MFC-K-Q), which is shown to have a linear convergence rate for the MFC problem, the first of its kind in the MARL literature.
Score: 7.800126150380472
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multi-agent reinforcement learning (MARL), despite its popularity and empirical success, suffers from the curse of dimensionality. This paper builds the mathematical framework to approximate cooperative MARL by a mean-field control (MFC) approach, and shows that the approximation error is of $\mathcal{O}(\frac{1}{\sqrt{N}})$. By establishing an appropriate form of the dynamic programming principle for both the value function and the Q function, it proposes a model-free kernel-based Q-learning algorithm (MFC-K-Q), which is shown to have a linear convergence rate for the MFC problem, the first of its kind in the MARL literature. It further establishes that the convergence rate and the sample complexity of MFC-K-Q are independent of the number of agents $N$, which provides an $\mathcal{O}(\frac{1}{\sqrt{N}})$ approximation to the MARL problem with $N$ agents in the learning environment. Empirical studies for the network traffic congestion problem demonstrate that MFC-K-Q outperforms existing MARL algorithms when $N$ is large, for instance when $N>50$.

Related papers

A finite time analysis of distributed Q-learning [6.663174194579773]
Multi-agent reinforcement learning (MARL) has witnessed a remarkable surge in interest, fueled by the empirical success achieved in applications of single-agent reinforcement learning (RL)<n>In this study, we consider a distributed Q-learning scenario, wherein a number of agents cooperatively solve a sequential decision making problem without access to the central reward function which is an average of the local rewards.
arXiv Detail & Related papers (2024-05-23T00:52:38Z)
DFedADMM: Dual Constraints Controlled Model Inconsistency for Decentralized Federated Learning [52.83811558753284]
Decentralized learning (DFL) discards the central server and establishes a decentralized communication network. Existing DFL methods still suffer from two major challenges: local inconsistency and local overfitting.
arXiv Detail & Related papers (2023-08-16T11:22:36Z)
Breaking the Curse of Multiagency: Provably Efficient Decentralized Multi-Agent RL with Function Approximation [44.051717720483595]
This paper presents the first line of MARL algorithms that provably resolve the curse of multiagency approximation. In exchange for learning a weaker version of CCEs, this algorithm applies to a wider range of problems under generic function approximation. Our algorithm always outputs Markov CCEs, and an optimal rate of $widetildemathcalO(epsilon-2)$ for finding $epsilon$-optimal solutions.
arXiv Detail & Related papers (2023-02-13T18:59:25Z)
Mean-Field Approximation of Cooperative Constrained Multi-Agent Reinforcement Learning (CMARL) [35.18639326270473]
We show that one can use the MFC approach to approximate the MARL problem even in the presence of constraints. Also, we provide a Natural Policy Gradient based algorithm and prove that it can solve the constrained MARL problem within an error of $mathcalO(e)$ with a sample complexity of $mathcalO(e-6)$.
arXiv Detail & Related papers (2022-09-15T16:33:38Z)
Can Mean Field Control (MFC) Approximate Cooperative Multi Agent Reinforcement Learning (MARL) with Non-Uniform Interaction? [33.484960394599455]
Mean-Field Control (MFC) is a powerful tool to solve Multi-Agent Reinforcement (MARL) problems. In this article, we relax the assumption of exchangeability and model the interaction between agents via an arbitrary doubly matrix. We prove that, if the reward of each agent is an affine function of the mean-field seen by that agent, then one can approximate such a non-uniform MARL problem.
arXiv Detail & Related papers (2022-02-28T19:03:09Z)
Permutation Compressors for Provably Faster Distributed Nonconvex Optimization [68.8204255655161]
We show that the MARINA method of Gorbunov et al (2021) can be considered as a state-of-the-art method in terms of theoretical communication complexity. Theory of MARINA to support the theory of potentially em correlated compressors, extends to the method beyond the classical independent compressors setting.
arXiv Detail & Related papers (2021-10-07T09:38:15Z)
Efficient Model-Based Multi-Agent Mean-Field Reinforcement Learning [89.31889875864599]
We propose an efficient model-based reinforcement learning algorithm for learning in multi-agent systems. Our main theoretical contributions are the first general regret bounds for model-based reinforcement learning for MFC. We provide a practical parametrization of the core optimization problem.
arXiv Detail & Related papers (2021-07-08T18:01:02Z)
Reinforcement Learning for Mean Field Games, with Applications to Economics [0.0]
Mean field games (MFG) and mean field control problems (MFC) are frameworks to study Nash equilibria or social optima in games with a continuum of agents. We present a two timescale approach with RL for MFG and MFC, which relies on a unified Q-learning algorithm.
arXiv Detail & Related papers (2021-06-25T16:45:04Z)
Tightening the Dependence on Horizon in the Sample Complexity of Q-Learning [59.71676469100807]
This work sharpens the sample complexity of synchronous Q-learning to an order of $frac|mathcalS|| (1-gamma)4varepsilon2$ for any $0varepsilon 1$. Our finding unveils the effectiveness of vanilla Q-learning, which matches that of speedy Q-learning without requiring extra computation and storage.
arXiv Detail & Related papers (2021-02-12T14:22:05Z)
Finite-Time Analysis for Double Q-learning [50.50058000948908]
We provide the first non-asymptotic, finite-time analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $epsilon$-accurate neighborhood of the global optimum.
arXiv Detail & Related papers (2020-09-29T18:48:21Z)
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error. We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)
Unified Reinforcement Q-Learning for Mean Field Game and Control Problems [0.0]
We present a Reinforcement Learning (RL) algorithm to solve infinite horizon Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The emphsame algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters.
arXiv Detail & Related papers (2020-06-24T17:45:44Z)
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction [63.41789556777387]
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP) We show that the number of samples needed to yield an entrywise $varepsilon$-accurate estimate of the Q-function is at most on the order of $frac1mu_min (1-gamma)5varepsilon2+ fract_mixmu_min (1-gamma)$ up to some logarithmic factor.
arXiv Detail & Related papers (2020-06-04T17:51:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.