Solving Zero-Sum Convex Markov Games
- URL: http://arxiv.org/abs/2506.16120v1
- Date: Thu, 19 Jun 2025 08:12:02 GMT
- Title: Solving Zero-Sum Convex Markov Games
- Authors: Fivos Kalogiannis, Emmanouil-Vasileios Vlatakis-Gkaragkounis, Ian Gemp, Georgios Piliouras,
- Abstract summary: We contribute the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum games (cMGs) by using independent methods.<n>We address the general constrained min-max problems under NC-p and two-sided pPL conditions.
- Score: 38.68653814645517
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We contribute the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum convex Markov games (cMGs) by using independent policy gradient methods. Convex Markov games, recently defined by Gemp et al. (2024), extend Markov decision processes to multi-agent settings with preferences that are convex over occupancy measures, offering a broad framework for modeling generic strategic interactions. However, even the fundamental min-max case of cMGs presents significant challenges, including inherent nonconvexity, the absence of Bellman consistency, and the complexity of the infinite horizon. We follow a two-step approach. First, leveraging properties of hidden-convex--hidden-concave functions, we show that a simple nonconvex regularization transforms the min-max optimization problem into a nonconvex-proximal Polyak-Lojasiewicz (NC-pPL) objective. Crucially, this regularization can stabilize the iterates of independent policy gradient methods and ultimately lead them to converge to equilibria. Second, building on this reduction, we address the general constrained min-max problems under NC-pPL and two-sided pPL conditions, providing the first global convergence guarantees for stochastic nested and alternating gradient descent-ascent methods, which we believe may be of independent interest.
Related papers
- Stability of Primal-Dual Gradient Flow Dynamics for Multi-Block Convex Optimization Problems [2.66854711376491]
Proposed dynamics are based on the proximal augmented Lagrangian and they provide a viable alternative to ADMM.<n>We leverage various structural properties to establish global (exponential) convergence guarantees for the proposed dynamics.<n>Our assumptions are much weaker than those required to prove (exponential) stability of primal-dual dynamics.
arXiv Detail & Related papers (2024-08-28T17:43:18Z) - Double Duality: Variational Primal-Dual Policy Optimization for
Constrained Reinforcement Learning [132.7040981721302]
We study the Constrained Convex Decision Process (MDP), where the goal is to minimize a convex functional of the visitation measure.
Design algorithms for a constrained convex MDP faces several challenges, including handling the large state space.
arXiv Detail & Related papers (2024-02-16T16:35:18Z) - PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash Equilibrium [58.26573117273626]
We consider the non-AL equilibrium nonconptotic objective function in two-player zero-sum continuous games.
Our novel insights into the particle-based algorithms for continuous distribution strategies are presented.
arXiv Detail & Related papers (2023-03-02T05:08:15Z) - Stability and Generalization for Markov Chain Stochastic Gradient
Methods [49.981789906200035]
We provide a comprehensive generalization analysis of MC-SGMs for both minimization and minimax problems.
We establish the optimal excess population risk bounds for both smooth and non-smooth cases.
We develop the first nearly optimal convergence rates for convex-concave problems both in expectation and with high probability.
arXiv Detail & Related papers (2022-09-16T15:42:51Z) - Convergence and sample complexity of natural policy gradient primal-dual methods for constrained MDPs [21.347689976296834]
We employ the natural policy gradient method to solve the discounted optimal optimal rate problem.
We also provide convergence and finite-sample guarantees for two sample-based NPG-PD algorithms.
arXiv Detail & Related papers (2022-06-06T04:28:04Z) - Policy-based Primal-Dual Methods for Concave CMDP with Variance Reduction [18.95829896746939]
We study Concave CMDPs where both the objective and constraints are defined as concave functions of the state-action occupancy measure.
We propose the Variance-Reduced Primal-Dual Policy Gradient, which updates the primal variable via policy gradient ascent and the dual variable via projected sub-gradient descent.
arXiv Detail & Related papers (2022-05-22T02:50:16Z) - Faster Algorithm and Sharper Analysis for Constrained Markov Decision
Process [56.55075925645864]
The problem of constrained decision process (CMDP) is investigated, where an agent aims to maximize the expected accumulated discounted reward subject to multiple constraints.
A new utilities-dual convex approach is proposed with novel integration of three ingredients: regularized policy, dual regularizer, and Nesterov's gradient descent dual.
This is the first demonstration that nonconcave CMDP problems can attain the lower bound of $mathcal O (1/epsilon)$ for all complexity optimization subject to convex constraints.
arXiv Detail & Related papers (2021-10-20T02:57:21Z) - Stochastic Gradient Descent-Ascent and Consensus Optimization for Smooth
Games: Convergence Analysis under Expected Co-coercivity [49.66890309455787]
We introduce the expected co-coercivity condition, explain its benefits, and provide the first last-iterate convergence guarantees of SGDA and SCO.
We prove linear convergence of both methods to a neighborhood of the solution when they use constant step-size.
Our convergence guarantees hold under the arbitrary sampling paradigm, and we give insights into the complexity of minibatching.
arXiv Detail & Related papers (2021-06-30T18:32:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.