Related papers: Refined Sample Complexity for Markov Games with Independent Linear Function Approximation

Refined Sample Complexity for Markov Games with Independent Linear Function Approximation

URL: http://arxiv.org/abs/2402.07082v2
Date: Tue, 11 Jun 2024 12:12:59 GMT
Title: Refined Sample Complexity for Markov Games with Independent Linear Function Approximation
Authors: Yan Dai, Qiwen Cui, Simon S. Du,
Abstract summary: Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL) This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing pessimistic estimation of the sub-optimality gap. We give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T-1/2) convergence rate, and avoids $textpoly(A_max)$ dependency simultaneously.
Score: 49.5660193419984
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Markov Games (MG) is an important model for Multi-Agent Reinforcement Learning (MARL). It was long believed that the "curse of multi-agents" (i.e., the algorithmic performance drops exponentially with the number of agents) is unavoidable until several recent works (Daskalakis et al., 2023; Cui et al., 2023; Wang et al., 2023). While these works resolved the curse of multi-agents, when the state spaces are prohibitively large and (linear) function approximations are deployed, they either had a slower convergence rate of $O(T^{-1/4})$ or brought a polynomial dependency on the number of actions $A_{\max}$ -- which is avoidable in single-agent cases even when the loss functions can arbitrarily vary with time. This paper first refines the AVLPR framework by Wang et al. (2023), with an insight of designing *data-dependent* (i.e., stochastic) pessimistic estimation of the sub-optimality gap, allowing a broader choice of plug-in algorithms. When specialized to MGs with independent linear function approximations, we propose novel *action-dependent bonuses* to cover occasionally extreme estimation errors. With the help of state-of-the-art techniques from the single-agent RL literature, we give the first algorithm that tackles the curse of multi-agents, attains the optimal $O(T^{-1/2})$ convergence rate, and avoids $\text{poly}(A_{\max})$ dependency simultaneously.

Related papers

Near-Optimal Online Learning for Multi-Agent Submodular Coordination: Tight Approximation and Communication Efficiency [52.60557300927007]
We present a $textbfMA-OSMA$ algorithm to transfer the discrete submodular problem into a continuous optimization. We also introduce a projection-free $textbfMA-OSEA$ algorithm, which effectively utilizes the KL divergence by mixing a uniform distribution. Our algorithms significantly improve the $(frac11+c)$-approximation provided by the state-of-the-art OSG algorithm.
arXiv Detail & Related papers (2025-02-07T15:57:56Z)
Minimax-Optimal Multi-Agent Robust Reinforcement Learning [7.237817437521988]
We extend the Q-FTRL algorithm citepli2022minimax to the RMGs in finite-horizon setting, assuming access to a generative model. We prove that the proposed algorithm achieves an $varepsilon$-robust coarse correlated equilibrium (CCE) with a sample complexity (up to log factors) of $widetildeOleft(H3Ssum_i=1mA_iminleftH,1/Rright/varepsilon2right), where $S$ denotes the
arXiv Detail & Related papers (2024-12-27T16:37:33Z)
MGDA Converges under Generalized Smoothness, Provably [27.87166415148172]
Multi-objective optimization (MOO) is receiving more attention in various fields such as multi-task learning. Recent works provide some effective algorithms with theoretical analysis but they are limited by the standard $L$-smooth or bounded-gradient assumptions. We study a more general and realistic class of generalized $ell$-smooth loss functions, where $ell$ is a general non-decreasing function of gradient norm.
arXiv Detail & Related papers (2024-05-29T18:36:59Z)
Sample-Efficient Multi-Agent RL: An Optimization Perspective [103.35353196535544]
We study multi-agent reinforcement learning (MARL) for the general-sum Markov Games (MGs) under the general function approximation. We introduce a novel complexity measure called the Multi-Agent Decoupling Coefficient (MADC) for general-sum MGs. We show that our algorithm provides comparable sublinear regret to the existing works.
arXiv Detail & Related papers (2023-10-10T01:39:04Z)
Cooperative Thresholded Lasso for Sparse Linear Bandit [6.52540785559241]
We present a novel approach to address the multi-agent sparse contextual linear bandit problem. It is first algorithm that tackles row-wise distributed data in sparse linear bandits. It is widely applicable to high-dimensional multi-agent problems where efficient feature extraction is critical for minimizing regret.
arXiv Detail & Related papers (2023-05-30T16:05:44Z)
Scalable Primal-Dual Actor-Critic Method for Safe Multi-Agent RL with General Utilities [12.104551746465932]
We investigate safe multi-agent reinforcement learning, where agents seek to collectively maximize an aggregate sum of local objectives while satisfying their own safety constraints. Our algorithm converges to a first-order stationary point (FOSP) at the rate of $mathcalOleft(T-2/3right)$. In the sample-based setting, we demonstrate that, with high probability, our algorithm requires $widetildemathcalOleft(epsilon-3.5right)$ samples to achieve an $epsilon$-FOSP.
arXiv Detail & Related papers (2023-05-27T20:08:35Z)
Factorization of Multi-Agent Sampling-Based Motion Planning [72.42734061131569]
Modern robotics often involves multiple embodied agents operating within a shared environment. Standard sampling-based algorithms can be used to search for solutions in the robots' joint space. We integrate the concept of factorization into sampling-based algorithms, which requires only minimal modifications to existing methods. We present a general implementation of a factorized SBA, derive an analytical gain in terms of sample complexity for PRM*, and showcase empirical results for RRG.
arXiv Detail & Related papers (2023-04-01T15:50:18Z)
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise. Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret. We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z)
Refined Regret for Adversarial MDPs with Linear Function Approximation [50.00022394876222]
We consider learning in an adversarial Decision Process (MDP) where the loss functions can change arbitrarily over $K$ episodes. This paper provides two algorithms that improve the regret to $tildemathcal O(K2/3)$ in the same setting.
arXiv Detail & Related papers (2023-01-30T14:37:21Z)
A Law of Iterated Logarithm for Multi-Agent Reinforcement Learning [3.655021726150368]
In Multi-Agent Reinforcement Learning (MARL), multiple agents interact with a common environment, as also with each other, for solving a shared problem in sequential decision-making. We derive a novel law of iterated for a family of distributed nonlinear approximation schemes that is useful in MARL.
arXiv Detail & Related papers (2021-10-27T08:01:17Z)
The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces [36.097537237660234]
We propose an algorithm that can provably find the Nash equilibrium policy using a number of samples. A key component of our new algorithm is the exploiter, which facilitates the learning of the main player by deliberately exploiting her weakness. Our theoretical framework is generic, which applies to a wide range of models including but not limited to MGs, MGs with linear or kernel function approximation, and MGs with rich observations.
arXiv Detail & Related papers (2021-06-07T05:39:09Z)
Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity [67.02490430380415]
We show that model-based MARL achieves a sample complexity of $tilde O(|S||B|(gamma)-3epsilon-2)$ for finding the Nash equilibrium (NE) value up to some $epsilon$ error. We also show that such a sample bound is minimax-optimal (up to logarithmic factors) if the algorithm is reward-agnostic, where the algorithm queries state transition samples without reward knowledge.
arXiv Detail & Related papers (2020-07-15T03:25:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.