Convex Markov Games and Beyond: New Proof of Existence, Characterization and Learning Algorithms for Nash Equilibria
- URL: http://arxiv.org/abs/2602.12181v1
- Date: Thu, 12 Feb 2026 17:11:20 GMT
- Title: Convex Markov Games and Beyond: New Proof of Existence, Characterization and Learning Algorithms for Nash Equilibria
- Authors: Anas Barakat, Ioannis Panageas, Antonios Varvitsiotis,
- Abstract summary: General Utility Markov Games (GUMGs) capture new applications requiring coupling between agents' occupancy measures.<n>We prove that Nash equilibria coincide with the fixed points of projected pseudo-gradient dynamics (i.e., first-order stationary points) enabled by a novel agent-wise gradient domination property.<n>Building on this characterization, we establish a policy gradient theorem for GUMGs and design a model-free policy gradient algorithm.
- Score: 20.875347023588652
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Convex Markov Games (cMGs) were recently introduced as a broad class of multi-agent learning problems that generalize Markov games to settings where strategic agents optimize general utilities beyond additive rewards. While cMGs expand the modeling frontier, their theoretical foundations, particularly the structure of Nash equilibria (NE) and guarantees for learning algorithms, are not yet well understood. In this work, we address these gaps for an extension of cMGs, which we term General Utility Markov Games (GUMGs), capturing new applications requiring coupling between agents' occupancy measures. We prove that in GUMGs, Nash equilibria coincide with the fixed points of projected pseudo-gradient dynamics (i.e., first-order stationary points), enabled by a novel agent-wise gradient domination property. This insight also yields a simple proof of NE existence using Brouwer's fixed-point theorem. We further show the existence of Markov perfect equilibria. Building on this characterization, we establish a policy gradient theorem for GUMGs and design a model-free policy gradient algorithm. For potential GUMGs, we establish iteration complexity guarantees for computing approximate-NE under exact gradients and provide sample complexity bounds in both the generative model and on-policy settings. Our results extend beyond prior work restricted to zero-sum cMGs, providing the first theoretical analysis of common-interest cMGs.
Related papers
- Provably Convergent Actor-Critic in Risk-averse MARL [15.77454427706097]
We study Risk-averse Quantal response Equilibria (RQE), a solution concept rooted in behavioral game theory that incorporates risk aversion and bounded rationality.<n>We propose a novel two-timescale Actor-Critic algorithm characterized by a fast-timescale actor and a slow-timescale critic.
arXiv Detail & Related papers (2026-02-12T20:29:41Z) - Achieving Logarithmic Regret in KL-Regularized Zero-Sum Markov Games [53.447182734351]
We develop and analyze algorithms that provably achieve improved sample efficiency under Reverse Kullback-Leibler (KL) regularization.<n>We study both two-player zero-sum Matrix games and Markov games: for Matrix games, we propose OMG, an algorithm based on best response sampling with optimistic bonuses, and extend this idea to Markov games through the algorithm SOMG.<n>Both algorithms achieve a logarithmic regret in $T$ that scales inversely with the KL regularization strength $beta$ in addition to the standard $widetildemathcalO(sqrtT)
arXiv Detail & Related papers (2025-10-15T01:00:54Z) - Solving Zero-Sum Convex Markov Games [38.68653814645517]
We contribute the first provable guarantees of global convergence to Nash equilibria (NE) in two-player zero-sum games (cMGs) by using independent methods.<n>We address the general constrained min-max problems under NC-p and two-sided pPL conditions.
arXiv Detail & Related papers (2025-06-19T08:12:02Z) - Breaking the Curse of Multiagency in Robust Multi-Agent Reinforcement Learning [37.80275600302316]
distributionally robust Markov games (RMGs) have been proposed to enhance robustness in MARL.<n>Two notorious and open challenges are the formulation of the uncertainty set and whether the corresponding RMGs can overcome the curse of multiagency.<n>In this work, we propose a natural class of RMGs inspired by behavioral economics, where each agent's uncertainty set is shaped by both the environment and the integrated behavior of other agents.
arXiv Detail & Related papers (2024-09-30T08:09:41Z) - Convergence of Decentralized Actor-Critic Algorithm in General-sum Markov Games [3.8779763612314633]
We study the properties of learning algorithms in general-sum Markov games.<n>In particular, we focus on a decentralized algorithm where each agent adopts an actor-critic learning dynamic.
arXiv Detail & Related papers (2024-09-06T20:49:11Z) - Provably Efficient Information-Directed Sampling Algorithms for Multi-Agent Reinforcement Learning [50.92957910121088]
This work designs and analyzes a novel set of algorithms for multi-agent reinforcement learning (MARL) based on the principle of information-directed sampling (IDS)
For episodic two-player zero-sum MGs, we present three sample-efficient algorithms for learning Nash equilibrium.
We extend Reg-MAIDS to multi-player general-sum MGs and prove that it can learn either the Nash equilibrium or coarse correlated equilibrium in a sample efficient manner.
arXiv Detail & Related papers (2024-04-30T06:48:56Z) - Improving Sample Efficiency of Model-Free Algorithms for Zero-Sum Markov Games [66.2085181793014]
We show that a model-free stage-based Q-learning algorithm can enjoy the same optimality in the $H$ dependence as model-based algorithms.
Our algorithm features a key novel design of updating the reference value functions as the pair of optimistic and pessimistic value functions.
arXiv Detail & Related papers (2023-08-17T08:34:58Z) - Multi-Player Zero-Sum Markov Games with Networked Separable Interactions [23.270589136657254]
We study a new class of Markov games, emph(multi-player) zero-sum Markov Games with emphNetworked separable interactions (zero-sum NMGs)<n>We show that finding approximate Markov emphstationary CCE in infinite-horizon discounted zero-sum NMGs is textttPPAD-hard, unless the underlying network has a star topology''
arXiv Detail & Related papers (2023-07-13T19:05:11Z) - Provably Learning Nash Policies in Constrained Markov Potential Games [90.87573337770293]
Multi-agent reinforcement learning (MARL) addresses sequential decision-making problems with multiple agents.
Constrained Markov Games (CMGs) are a natural formalism for safe MARL problems, though generally intractable.
arXiv Detail & Related papers (2023-06-13T13:08:31Z) - Breaking the Curse of Multiagents in a Large State Space: RL in Markov
Games with Independent Linear Function Approximation [56.715186432566576]
We propose a new model, independent linear Markov game, for reinforcement learning with a large state space and a large number of agents.
We design new algorithms for learning correlated equilibria (CCE) and Markov correlated equilibria (CE) with sample bounds complexity that only scalely with each agent's own function class complexity.
Our algorithms rely on two key technical innovations: (1) utilizing policy replay to tackle non-stationarity incurred by multiple agents and the use of function approximation; and (2) separating learning Markov equilibria and exploration in the Markov games.
arXiv Detail & Related papers (2023-02-07T18:47:48Z) - On the Complexity of Computing Markov Perfect Equilibrium in General-Sum
Stochastic Games [18.48133964089095]
Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions.
We derive an approximate Perfect Equilibrium (MPE) in a finite-state SGs Game within the exponential precision of textbfPPAD-complete.
Our results indicate that finding an MPE in SGs is highly unlikely to be textbfNP-hard unless textbfNP=textbfco-NP.
arXiv Detail & Related papers (2021-09-04T05:47:59Z) - The Power of Exploiter: Provable Multi-Agent RL in Large State Spaces [36.097537237660234]
We propose an algorithm that can provably find the Nash equilibrium policy using a number of samples.
A key component of our new algorithm is the exploiter, which facilitates the learning of the main player by deliberately exploiting her weakness.
Our theoretical framework is generic, which applies to a wide range of models including but not limited to MGs, MGs with linear or kernel function approximation, and MGs with rich observations.
arXiv Detail & Related papers (2021-06-07T05:39:09Z) - Cauchy-Schwarz Regularized Autoencoder [68.80569889599434]
Variational autoencoders (VAE) are a powerful and widely-used class of generative models.
We introduce a new constrained objective based on the Cauchy-Schwarz divergence, which can be computed analytically for GMMs.
Our objective improves upon variational auto-encoding models in density estimation, unsupervised clustering, semi-supervised learning, and face analysis.
arXiv Detail & Related papers (2021-01-06T17:36:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.