Learning Nash Equilibria in Zero-Sum Stochastic Games via
Entropy-Regularized Policy Approximation
- URL: http://arxiv.org/abs/2009.00162v2
- Date: Sun, 27 Jun 2021 04:26:01 GMT
- Title: Learning Nash Equilibria in Zero-Sum Stochastic Games via
Entropy-Regularized Policy Approximation
- Authors: Yue Guan, Qifan Zhang, Panagiotis Tsiotras
- Abstract summary: We explore the use of policy approximations to reduce the computational cost of learning Nash equilibria in zero-sum games.
We propose a new Q-learning type algorithm that uses a sequence of entropy-regularized soft policies to approximate the Nash policy.
We prove that under certain conditions, by updating the regularized Q-function, the algorithm converges to a Nash equilibrium.
- Score: 18.35524179586723
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore the use of policy approximations to reduce the computational cost
of learning Nash equilibria in zero-sum stochastic games. We propose a new
Q-learning type algorithm that uses a sequence of entropy-regularized soft
policies to approximate the Nash policy during the Q-function updates. We prove
that under certain conditions, by updating the regularized Q-function, the
algorithm converges to a Nash equilibrium. We also demonstrate the proposed
algorithm's ability to transfer previous training experiences, enabling the
agents to adapt quickly to new environments. We provide a dynamic
hyper-parameter scheduling scheme to further expedite convergence. Empirical
results applied to a number of stochastic games verify that the proposed
algorithm converges to the Nash equilibrium, while exhibiting a major speed-up
over existing algorithms.
Related papers
- Independent Learning in Constrained Markov Potential Games [19.083595175045073]
Constrained Markov games offer a formal framework for modeling multi-agent reinforcement learning problems.
We propose an independent policy gradient algorithm for learning approximate constrained Nash equilibria.
arXiv Detail & Related papers (2024-02-27T20:57:35Z) - Learning Nash Equilibria in Zero-Sum Markov Games: A Single Time-scale Algorithm Under Weak Reachability [11.793922711718645]
We consider decentralized learning for zero-sum games, where players only see their information and are to actions and payoffs of the opponent.
Previous works demonstrated convergence to a Nash equilibrium in this setting using double time-scale algorithms under strong reachability assumptions.
Our contribution is a rational and convergent algorithm, utilizing Tsallis-entropy regularization in a value-iteration-based algorithm.
arXiv Detail & Related papers (2023-12-13T09:31:30Z) - PAPAL: A Provable PArticle-based Primal-Dual ALgorithm for Mixed Nash Equilibrium [58.26573117273626]
We consider the non-AL equilibrium nonconptotic objective function in two-player zero-sum continuous games.
Our novel insights into the particle-based algorithms for continuous distribution strategies are presented.
arXiv Detail & Related papers (2023-03-02T05:08:15Z) - Decentralized Policy Gradient for Nash Equilibria Learning of
General-sum Stochastic Games [8.780797886160402]
We study Nash equilibria learning of a general-sum game with an unknown transition probability density function.
For the case with exact pseudo gradients, we design a two-loop algorithm by the equivalence of Nash equilibrium and variational inequality problems.
arXiv Detail & Related papers (2022-10-14T09:09:56Z) - Maximum-Likelihood Inverse Reinforcement Learning with Finite-Time
Guarantees [56.848265937921354]
Inverse reinforcement learning (IRL) aims to recover the reward function and the associated optimal policy.
Many algorithms for IRL have an inherently nested structure.
We develop a novel single-loop algorithm for IRL that does not compromise reward estimation accuracy.
arXiv Detail & Related papers (2022-10-04T17:13:45Z) - Faster Last-iterate Convergence of Policy Optimization in Zero-Sum
Markov Games [63.60117916422867]
This paper focuses on the most basic setting of competitive multi-agent RL, namely two-player zero-sum Markov games.
We propose a single-loop policy optimization method with symmetric updates from both agents, where the policy is updated via the entropy-regularized optimistic multiplicative weights update (OMWU) method.
Our convergence results improve upon the best known complexities, and lead to a better understanding of policy optimization in competitive Markov games.
arXiv Detail & Related papers (2022-10-03T16:05:43Z) - First-Order Algorithms for Nonlinear Generalized Nash Equilibrium
Problems [88.58409977434269]
We consider the problem of computing an equilibrium in a class of nonlinear generalized Nash equilibrium problems (NGNEPs)
Our contribution is to provide two simple first-order algorithmic frameworks based on the quadratic penalty method and the augmented Lagrangian method.
We provide nonasymptotic theoretical guarantees for these algorithms.
arXiv Detail & Related papers (2022-04-07T00:11:05Z) - On the Convergence of Stochastic Extragradient for Bilinear Games with
Restarted Iteration Averaging [96.13485146617322]
We present an analysis of the ExtraGradient (SEG) method with constant step size, and present variations of the method that yield favorable convergence.
We prove that when augmented with averaging, SEG provably converges to the Nash equilibrium, and such a rate is provably accelerated by incorporating a scheduled restarting procedure.
arXiv Detail & Related papers (2021-06-30T17:51:36Z) - Fast Policy Extragradient Methods for Competitive Games with Entropy
Regularization [40.21627891283402]
This paper investigates the problem of computing the equilibrium of competitive games.
Motivated by the algorithmic role of entropy regularization, we develop provably efficient extragradient methods.
arXiv Detail & Related papers (2021-05-31T17:51:15Z) - Momentum Q-learning with Finite-Sample Convergence Guarantee [49.38471009162477]
This paper analyzes a class of momentum-based Q-learning algorithms with finite-sample guarantee.
We establish the convergence guarantee for MomentumQ with linear function approximations and Markovian sampling.
We demonstrate through various experiments that the proposed MomentumQ outperforms other momentum-based Q-learning algorithms.
arXiv Detail & Related papers (2020-07-30T12:27:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.