Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability
- URL: http://arxiv.org/abs/2312.08008v2
- Date: Fri, 24 May 2024 09:57:54 GMT
- Title: Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability
- Authors: Reda Ouhamma, Maryam Kamgarpour,
- Abstract summary: We consider decentralized learning for zero-sum games, where players only see their information and are to actions and payoffs of the opponent.
Previous works demonstrated convergence to a Nash equilibrium in this setting using double time-scale algorithms under strong reachability assumptions.
Our contribution is a rational and convergent algorithm, utilizing Tsallis-entropy regularization in a value-iteration-based algorithm.
- Score: 11.793922711718645
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We consider decentralized learning for zero-sum games, where players only see their payoff information and are agnostic to actions and payoffs of the opponent. Previous works demonstrated convergence to a Nash equilibrium in this setting using double time-scale algorithms under strong reachability assumptions. We address the open problem of achieving an approximate Nash equilibrium efficiently with an uncoupled and single time-scale algorithm under weaker conditions. Our contribution is a rational and convergent algorithm, utilizing Tsallis-entropy regularization in a value-iteration-based approach. The algorithm learns an approximate Nash equilibrium in polynomial time, requiring only the existence of a policy pair that induces an irreducible and aperiodic Markov chain, thus considerably weakening past assumptions. Our analysis leverages negative drift inequalities and introduces novel properties of Tsallis entropy that are of independent interest.
Related papers
- The equivalence of dynamic and strategic stability under regularized
learning in games [33.74394172275373]
We examine the long-run behavior of regularized, no-regret learning in finite games.
We obtain an equivalence between strategic and dynamic stability.
We show that methods based on entropic regularization converge at a geometric rate.
arXiv Detail & Related papers (2023-11-04T14:07:33Z) - Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations [98.5802673062712]
We introduce temporally-coupled perturbations, presenting a novel challenge for existing robust reinforcement learning methods.
We propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially observable two-player zero-sum game.
arXiv Detail & Related papers (2023-07-22T12:10:04Z) - PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash Equilibrium [58.26573117273626]
We consider the non-AL equilibrium nonconptotic objective function in two-player zero-sum continuous games.
Our novel insights into the particle-based algorithms for continuous distribution strategies are presented.
arXiv Detail & Related papers (2023-03-02T05:08:15Z) - Asynchronous Gradient Play in Zero-Sum Multi-agent Games [25.690033495071923]
We study asynchronous gradient plays in zero-sum polymatrix games under delayed feedbacks.
To the best of our knowledge, this work is the first that aims to understand asynchronous gradient play in zero-sum polymatrix games.
arXiv Detail & Related papers (2022-11-16T15:37:23Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Regret Minimization and Convergence to Equilibria in General-sum Markov
Games [57.568118148036376]
We present the first algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents.
Our algorithm is decentralized, computationally efficient, and does not require any communication between agents.
arXiv Detail & Related papers (2022-07-28T16:27:59Z) - A unified stochastic approximation framework for learning in games [82.74514886461257]
We develop a flexible approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite)
The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, exponential/multiplicative weights for learning in finite games, optimistic and bandit variants of the above, etc.
arXiv Detail & Related papers (2022-06-08T14:30:38Z) - Fast Policy Extragradient Methods for Competitive Games with Entropy
Regularization [40.21627891283402]
This paper investigates the problem of computing the equilibrium of competitive games.
Motivated by the algorithmic role of entropy regularization, we develop provably efficient extragradient methods.
arXiv Detail & Related papers (2021-05-31T17:51:15Z) - Survival of the strictest: Stable and unstable equilibria under
regularized learning with partial information [32.384868685390906]
We examine the Nash equilibrium convergence properties of no-regret learning in general N-player games.
We establish a comprehensive equivalence between the stability of a Nash equilibrium and its support.
It provides a clear refinement criterion for the prediction of the day-to-day behavior of no-regret learning in games.
arXiv Detail & Related papers (2021-01-12T18:55:11Z) - Learning from History for Byzantine Robust Optimization [52.68913869776858]
Byzantine robustness has received significant attention recently given its importance for distributed learning.
We show that most existing robust aggregation rules may not converge even in the absence of any Byzantine attackers.
arXiv Detail & Related papers (2020-12-18T16:22:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.