Related papers: Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability

Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability

URL: http://arxiv.org/abs/2312.08008v2
Date: Fri, 24 May 2024 09:57:54 GMT
Title: Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability
Authors: Reda Ouhamma, Maryam Kamgarpour,
Abstract summary: We consider decentralized learning for zero-sum games, where players only see their information and are to actions and payoffs of the opponent. Previous works demonstrated convergence to a Nash equilibrium in this setting using double time-scale algorithms under strong reachability assumptions. Our contribution is a rational and convergent algorithm, utilizing Tsallis-entropy regularization in a value-iteration-based algorithm.
Score: 11.793922711718645
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: We consider decentralized learning for zero-sum games, where players only see their payoff information and are agnostic to actions and payoffs of the opponent. Previous works demonstrated convergence to a Nash equilibrium in this setting using double time-scale algorithms under strong reachability assumptions. We address the open problem of achieving an approximate Nash equilibrium efficiently with an uncoupled and single time-scale algorithm under weaker conditions. Our contribution is a rational and convergent algorithm, utilizing Tsallis-entropy regularization in a value-iteration-based approach. The algorithm learns an approximate Nash equilibrium in polynomial time, requiring only the existence of a policy pair that induces an irreducible and aperiodic Markov chain, thus considerably weakening past assumptions. Our analysis leverages negative drift inequalities and introduces novel properties of Tsallis entropy that are of independent interest.

Related papers

Finite-Time Information-Theoretic Bounds in Queueing Control [54.11376591632282]
We derive new policies that achieve them-for the total queue length in scheduling problems over processing networks with both adversarial and arrivals.<n>These findings reveal a fundamental limitation on "drift-only" methods and point the way toward principled, non-asymptotic optimality in queueing control.
arXiv Detail & Related papers (2025-06-23T04:14:40Z)
Barriers to Welfare Maximization with No-Regret Learning [68.66209476382213]
We prove lower bounds for computing a near-optimal $T$-sparse CCE. In particular, we show that the inapproximability of maximum clique precludes attaining any non-trivial sparsity in time.
arXiv Detail & Related papers (2024-11-04T00:34:56Z)
The equivalence of dynamic and strategic stability under regularized learning in games [33.74394172275373]
We examine the long-run behavior of regularized, no-regret learning in finite games. We obtain an equivalence between strategic and dynamic stability. We show that methods based on entropic regularization converge at a geometric rate.
arXiv Detail & Related papers (2023-11-04T14:07:33Z)
Game-Theoretic Robust Reinforcement Learning Handles Temporally-Coupled Perturbations [98.5802673062712]
We introduce temporally-coupled perturbations, presenting a novel challenge for existing robust reinforcement learning methods. We propose GRAD, a novel game-theoretic approach that treats the temporally-coupled robust RL problem as a partially observable two-player zero-sum game.
arXiv Detail & Related papers (2023-07-22T12:10:04Z)
PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash Equilibrium [58.26573117273626]
We consider the non-AL equilibrium nonconptotic objective function in two-player zero-sum continuous games. Our novel insights into the particle-based algorithms for continuous distribution strategies are presented.
arXiv Detail & Related papers (2023-03-02T05:08:15Z)
Asynchronous Gradient Play in Zero-Sum Multi-agent Games [25.690033495071923]
We study asynchronous gradient plays in zero-sum polymatrix games under delayed feedbacks. To the best of our knowledge, this work is the first that aims to understand asynchronous gradient play in zero-sum polymatrix games.
arXiv Detail & Related papers (2022-11-16T15:37:23Z)
Faster Last-iterate Convergence of Policy Optimization in Zero-Sum Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games. We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method. Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z)
Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium [157.0902680672422]
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation. We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap.
arXiv Detail & Related papers (2022-08-10T14:21:54Z)
Regret Minimization and Convergence to Equilibria in General-sum Markov Games [57.568118148036376]
We present the first algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents. Our algorithm is decentralized, computationally efficient, and does not require any communication between agents.
arXiv Detail & Related papers (2022-07-28T16:27:59Z)
A unified stochastic approximation framework for learning in games [82.74514886461257]
We develop a flexible approximation framework for analyzing the long-run behavior of learning in games (both continuous and finite) The proposed analysis template incorporates a wide array of popular learning algorithms, including gradient-based methods, exponential/multiplicative weights for learning in finite games, optimistic and bandit variants of the above, etc.
arXiv Detail & Related papers (2022-06-08T14:30:38Z)
Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization [40.21627891283402]
This paper investigates the problem of computing the equilibrium of competitive games. Motivated by the algorithmic role of entropy regularization, we develop provably efficient extragradient methods.
arXiv Detail & Related papers (2021-05-31T17:51:15Z)
Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games [78.65798135008419]
It remains vastly open how to learn the Stackelberg equilibrium in general-sum games efficiently from samples. This paper initiates the theoretical study of sample-efficient learning of the Stackelberg equilibrium in two-player turn-based general-sum games.
arXiv Detail & Related papers (2021-02-23T05:11:07Z)
Last-iterate Convergence of Decentralized Optimistic Gradient Descent/Ascent in Infinite-horizon Competitive Markov Games [37.70703888365849]
We study infinite-horizon discounted two-player zero-sum Markov games. We develop a decentralized algorithm that converges to the set of Nash equilibria under self-play.
arXiv Detail & Related papers (2021-02-08T21:45:56Z)
Survival of the strictest: Stable and unstable equilibria under regularized learning with partial information [32.384868685390906]
We examine the Nash equilibrium convergence properties of no-regret learning in general N-player games. We establish a comprehensive equivalence between the stability of a Nash equilibrium and its support. It provides a clear refinement criterion for the prediction of the day-to-day behavior of no-regret learning in games.
arXiv Detail & Related papers (2021-01-12T18:55:11Z)
Learning from History for Byzantine Robust Optimization [52.68913869776858]
Byzantine robustness has received significant attention recently given its importance for distributed learning. We show that most existing robust aggregation rules may not converge even in the absence of any Byzantine attackers.
arXiv Detail & Related papers (2020-12-18T16:22:32Z)
Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation [18.35524179586723]
We explore the use of policy approximations to reduce the computational cost of learning Nash equilibria in zero-sum games. We propose a new Q-learning type algorithm that uses a sequence of entropy-regularized soft policies to approximate the Nash policy. We prove that under certain conditions, by updating the regularized Q-function, the algorithm converges to a Nash equilibrium.
arXiv Detail & Related papers (2020-09-01T01:03:44Z)
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games. In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap. In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.