Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games
- URL: http://arxiv.org/abs/2401.15240v2
- Date: Wed, 1 May 2024 21:52:12 GMT
- Title: Near-Optimal Policy Optimization for Correlated Equilibrium in General-Sum Markov Games
- Authors: Yang Cai, Haipeng Luo, Chen-Yu Wei, Weiqiang Zheng,
- Abstract summary: We provide an uncoupled policy optimization algorithm that attains a near-optimal $tildeO(T-1)$ convergence rate for computing a correlated equilibrium.
Our algorithm is constructed by combining two main elements (i.e. smooth value updates and (ii. the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer)
- Score: 44.95137108337898
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study policy optimization algorithms for computing correlated equilibria in multi-player general-sum Markov Games. Previous results achieve $O(T^{-1/2})$ convergence rate to a correlated equilibrium and an accelerated $O(T^{-3/4})$ convergence rate to the weaker notion of coarse correlated equilibrium. In this paper, we improve both results significantly by providing an uncoupled policy optimization algorithm that attains a near-optimal $\tilde{O}(T^{-1})$ convergence rate for computing a correlated equilibrium. Our algorithm is constructed by combining two main elements (i) smooth value updates and (ii) the optimistic-follow-the-regularized-leader algorithm with the log barrier regularizer.
Related papers
- Distributed Optimization via Energy Conservation Laws in Dilated Coordinates [5.35599092568615]
This paper introduces an energy conservation approach for analyzing continuous-time dynamical systems in dilated coordinates.
convergence rates can be explicitly expressed in terms of the inverse time-dilation factor.
Its accelerated convergence behavior is benchmarked against various state-of-the-art distributed optimization algorithms on practical, large-scale problems.
arXiv Detail & Related papers (2024-09-28T08:02:43Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Policy Optimization for Markov Games: Unified Framework and Faster
Convergence [81.3266426402464]
We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game.
We extend this algorithm to multi-player general Markov Games and show a $mathcalwidetildeO(T-1/2)$ convergence rate to Correlated Equilibria (CCE)
arXiv Detail & Related papers (2022-06-06T14:23:13Z) - Optimal Correlated Equilibria in General-Sum Extensive-Form Games: Fixed-Parameter Algorithms, Hardness, and Two-Sided Column-Generation [78.48747645545944]
We study the problem of finding optimal correlated equilibria of various sorts in extensive-form games.
We introduce a new algorithm for computing optimal equilibria in all three notions.
arXiv Detail & Related papers (2022-03-14T15:21:18Z) - Momentum Accelerates the Convergence of Stochastic AUPRC Maximization [80.8226518642952]
We study optimization of areas under precision-recall curves (AUPRC), which is widely used for imbalanced tasks.
We develop novel momentum methods with a better iteration of $O (1/epsilon4)$ for finding an $epsilon$stationary solution.
We also design a novel family of adaptive methods with the same complexity of $O (1/epsilon4)$, which enjoy faster convergence in practice.
arXiv Detail & Related papers (2021-07-02T16:21:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.