Can We Find Nash Equilibria at a Linear Rate in Markov Games?
- URL: http://arxiv.org/abs/2303.03095v1
- Date: Fri, 3 Mar 2023 02:40:26 GMT
- Title: Can We Find Nash Equilibria at a Linear Rate in Markov Games?
- Authors: Zhuoqing Song, Jason D. Lee, Zhuoran Yang
- Abstract summary: We study decentralized learning in two-player zero-sum discounted Markov games.
The goal is to design a policy optimization algorithm for either agent satisfying two properties.
- Score: 95.10091348976779
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study decentralized learning in two-player zero-sum discounted Markov
games where the goal is to design a policy optimization algorithm for either
agent satisfying two properties. First, the player does not need to know the
policy of the opponent to update its policy. Second, when both players adopt
the algorithm, their joint policy converges to a Nash equilibrium of the game.
To this end, we construct a meta algorithm, dubbed as $\texttt{Homotopy-PO}$,
which provably finds a Nash equilibrium at a global linear rate. In particular,
$\texttt{Homotopy-PO}$ interweaves two base algorithms $\texttt{Local-Fast}$
and $\texttt{Global-Slow}$ via homotopy continuation. $\texttt{Local-Fast}$ is
an algorithm that enjoys local linear convergence while $\texttt{Global-Slow}$
is an algorithm that converges globally but at a slower sublinear rate. By
switching between these two base algorithms, $\texttt{Global-Slow}$ essentially
serves as a ``guide'' which identifies a benign neighborhood where
$\texttt{Local-Fast}$ enjoys fast convergence. However, since the exact size of
such a neighborhood is unknown, we apply a doubling trick to switch between
these two base algorithms. The switching scheme is delicately designed so that
the aggregated performance of the algorithm is driven by $\texttt{Local-Fast}$.
Furthermore, we prove that $\texttt{Local-Fast}$ and $\texttt{Global-Slow}$ can
both be instantiated by variants of optimistic gradient descent/ascent (OGDA)
method, which is of independent interest.
Related papers
- Fully Dynamic Adversarially Robust Correlation Clustering in Polylogarithmic Update Time [19.25942907402098]
We study the dynamic correlation clustering problem with $textitadaptive$ edge label flips.
In correlation clustering, we are given a $n$-vertex complete graph whose edges are labeled either $(+)$ or $(-)$.
We consider the dynamic setting with adversarial robustness, in which the $textitadaptive$ adversary could flip the label of an edge based on the current output of the algorithm.
arXiv Detail & Related papers (2024-11-15T06:26:37Z) - Efficient Algorithms for Generalized Linear Bandits with Heavy-tailed
Rewards [40.99322897009357]
We propose two novel algorithms based on truncation and mean of medians.
Our truncation-based algorithm supports online learning, distinguishing it from existing truncation-based approaches.
Our algorithms improve the regret bounds by a logarithmic factor compared to existing algorithms when $epsilon=1$.
arXiv Detail & Related papers (2023-10-28T13:01:10Z) - Doubly Optimal No-Regret Learning in Monotone Games [10.760195409078591]
We propose the first doubly optimal no-regret learning algorithm for smooth monotone games.
Our algorithm achieves both (i) the optimal $O(sqrtT)$ regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal $O(frac1T)$ last-iterate convergence rate to a Nash equilibrium.
arXiv Detail & Related papers (2023-01-30T17:55:53Z) - Policy Optimization for Markov Games: Unified Framework and Faster
Convergence [81.3266426402464]
We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game.
We extend this algorithm to multi-player general Markov Games and show a $mathcalwidetildeO(T-1/2)$ convergence rate to Correlated Equilibria (CCE)
arXiv Detail & Related papers (2022-06-06T14:23:13Z) - Nearly Optimal Policy Optimization with Stable at Any Time Guarantee [53.155554415415445]
Policy-based method in citetshani 2020optimistic is only $tildeO(sqrtSAH3K + sqrtAH4K)$ where $S$ is the number of states, $A$ is the number of actions, $H$ is the horizon, and $K$ is the number of episodes, and there is a $sqrtSH$ gap compared with the information theoretic lower bound $tildeOmega(sqrtSAH
arXiv Detail & Related papers (2021-12-21T01:54:17Z) - Private Stochastic Convex Optimization: Optimal Rates in $\ell_1$
Geometry [69.24618367447101]
Up to logarithmic factors the optimal excess population loss of any $(varepsilon,delta)$-differently private is $sqrtlog(d)/n + sqrtd/varepsilon n.$
We show that when the loss functions satisfy additional smoothness assumptions, the excess loss is upper bounded (up to logarithmic factors) by $sqrtlog(d)/n + (log(d)/varepsilon n)2/3.
arXiv Detail & Related papers (2021-03-02T06:53:44Z) - Almost Optimal Algorithms for Two-player Markov Games with Linear
Function Approximation [92.99933928528797]
We study reinforcement learning for two-player zero-sum Markov games with simultaneous moves.
We propose an algorithm Nash-UCRL-VTR based on the principle "Optimism-in-Face-of-Uncertainty"
We show that Nash-UCRL-VTR can provably achieve an $tildeO(dHsqrtT)$ regret, where $d$ is the linear function dimension.
arXiv Detail & Related papers (2021-02-15T09:09:16Z) - Streaming Complexity of SVMs [110.63976030971106]
We study the space complexity of solving the bias-regularized SVM problem in the streaming model.
We show that for both problems, for dimensions of $frac1lambdaepsilon$, one can obtain streaming algorithms with spacely smaller than $frac1lambdaepsilon$.
arXiv Detail & Related papers (2020-07-07T17:10:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.