Fast Policy Extragradient Methods for Competitive Games with Entropy
Regularization
- URL: http://arxiv.org/abs/2105.15186v1
- Date: Mon, 31 May 2021 17:51:15 GMT
- Title: Fast Policy Extragradient Methods for Competitive Games with Entropy
Regularization
- Authors: Shicong Cen, Yuting Wei, Yuejie Chi
- Abstract summary: This paper investigates the problem of computing the equilibrium of competitive games.
Motivated by the algorithmic role of entropy regularization, we develop provably efficient extragradient methods.
- Score: 40.21627891283402
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper investigates the problem of computing the equilibrium of
competitive games, which is often modeled as a constrained saddle-point
optimization problem with probability simplex constraints. Despite recent
efforts in understanding the last-iterate convergence of extragradient methods
in the unconstrained setting, the theoretical underpinnings of these methods in
the constrained settings, especially those using multiplicative updates, remain
highly inadequate, even when the objective function is bilinear. Motivated by
the algorithmic role of entropy regularization in single-agent reinforcement
learning and game theory, we develop provably efficient extragradient methods
to find the quantal response equilibrium (QRE) -- which are solutions to
zero-sum two-player matrix games with entropy regularization -- at a linear
rate. The proposed algorithms can be implemented in a decentralized manner,
where each player executes symmetric and multiplicative updates iteratively
using its own payoff without observing the opponent's actions directly. In
addition, by controlling the knob of entropy regularization, the proposed
algorithms can locate an approximate Nash equilibrium of the unregularized
matrix game at a sublinear rate without assuming the Nash equilibrium to be
unique. Our methods also lead to efficient policy extragradient algorithms for
solving entropy-regularized zero-sum Markov games at a linear rate. All of our
convergence rates are nearly dimension-free, which are independent of the size
of the state and action spaces up to logarithm factors, highlighting the
positive role of entropy regularization for accelerating convergence.
Related papers
- Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability [11.793922711718645]
We consider decentralized learning for zero-sum games, where players only see their information and are to actions and payoffs of the opponent.
Previous works demonstrated convergence to a Nash equilibrium in this setting using double time-scale algorithms under strong reachability assumptions.
Our contribution is a rational and convergent algorithm, utilizing Tsallis-entropy regularization in a value-iteration-based algorithm.
arXiv Detail & Related papers (2023-12-13T09:31:30Z) - Hybrid algorithm simulating non-equilibrium steady states of an open
quantum system [10.752869788647802]
Non-equilibrium steady states are a focal point of research in the study of open quantum systems.
Previous variational algorithms for searching these steady states have suffered from resource-intensive implementations.
We present a novel variational quantum algorithm that efficiently searches for non-equilibrium steady states by simulating the operator-sum form of the Lindblad equation.
arXiv Detail & Related papers (2023-09-13T01:57:27Z) - PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash
Equilibrium [62.51015395213579]
We consider the non-AL equilibrium nonconptotic objective function in two-player zero-sum continuous games.
The proposed algorithm employs the movements of particles to represent the updates of random strategies for the $ilon$-mixed Nash equilibrium.
arXiv Detail & Related papers (2023-03-02T05:08:15Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - Regularized Gradient Descent Ascent for Two-Player Zero-Sum Markov Games [16.09467599829253]
We study the problem of finding Nash equilibrium in a two-player zero-sum game.
Our main contribution is to show that under proper choices of the regularization parameter, the gradient descents to the Nash equilibrium of the original unregularized problem.
arXiv Detail & Related papers (2022-05-27T03:24:12Z) - Independent Natural Policy Gradient Methods for Potential Games:
Finite-time Global Convergence with Entropy Regularization [28.401280095467015]
We study the finite-time convergence of independent entropy-regularized natural policy gradient (NPG) methods for potential games.
We show that the proposed method converges to the quantal response equilibrium (QRE) at a sublinear rate, which is independent of the size of the action space.
arXiv Detail & Related papers (2022-04-12T01:34:02Z) - First-Order Algorithms for Nonlinear Generalized Nash Equilibrium
Problems [88.58409977434269]
We consider the problem of computing an equilibrium in a class of nonlinear generalized Nash equilibrium problems (NGNEPs)
Our contribution is to provide two simple first-order algorithmic frameworks based on the quadratic penalty method and the augmented Lagrangian method.
We provide nonasymptotic theoretical guarantees for these algorithms.
arXiv Detail & Related papers (2022-04-07T00:11:05Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - On dissipative symplectic integration with applications to
gradient-based optimization [77.34726150561087]
We propose a geometric framework in which discretizations can be realized systematically.
We show that a generalization of symplectic to nonconservative and in particular dissipative Hamiltonian systems is able to preserve rates of convergence up to a controlled error.
arXiv Detail & Related papers (2020-04-15T00:36:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.