Tight last-iterate convergence rates for no-regret learning in
multi-player games
- URL: http://arxiv.org/abs/2010.13724v1
- Date: Mon, 26 Oct 2020 17:06:19 GMT
- Title: Tight last-iterate convergence rates for no-regret learning in
multi-player games
- Authors: Noah Golowich, Sarath Pattathil, Constantinos Daskalakis
- Abstract summary: We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O (1/sqrtT)$ in smooth monotone games.
We also show that the $O (1/sqrtT)$ rate is tight for all $p$-SCLI algorithms, which includes OG as a special case.
- Score: 31.604950465209335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the question of obtaining last-iterate convergence rates for
no-regret learning algorithms in multi-player games. We show that the
optimistic gradient (OG) algorithm with a constant step-size, which is
no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the
gap function in smooth monotone games. This result addresses a question of
Mertikopoulos & Zhou (2018), who asked whether extra-gradient approaches (such
as OG) can be applied to achieve improved guarantees in the multi-agent
learning setting. The proof of our upper bound uses a new technique centered
around an adaptive choice of potential function at each iteration. We also show
that the $O(1/\sqrt{T})$ rate is tight for all $p$-SCLI algorithms, which
includes OG as a special case. As a byproduct of our lower bound analysis we
additionally present a proof of a conjecture of Arjevani et al. (2015) which is
more direct than previous approaches.
Related papers
- On the SAGA algorithm with decreasing step [0.0]
We introduce a new $lambda$-SAGA algorithm which interpolates between the Gradient Descent and SAGA algorithms.
We establish a central limit theorem for the $lambda$-SAGA algorithm.
We provide the non-asymptotic $mathbbLp$ rates of convergence.
arXiv Detail & Related papers (2024-10-02T12:04:36Z) - Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
Shortest Path [80.60592344361073]
We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
arXiv Detail & Related papers (2024-02-14T07:52:00Z) - Last-Iterate Convergence Properties of Regret-Matching Algorithms in
Games [72.43065138581589]
We study the last-iterate convergence properties of various popular variants of RM$+$.
We show numerically that several practical variants such as simultaneous RM$+$, alternating RM$+$, and predictive RM$+$, all lack last-iterate convergence guarantees.
We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence.
arXiv Detail & Related papers (2023-11-01T17:34:58Z) - Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback [75.29048190099523]
Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
arXiv Detail & Related papers (2023-10-21T18:38:13Z) - Doubly Optimal No-Regret Learning in Monotone Games [10.760195409078591]
We propose the first doubly optimal no-regret learning algorithm for smooth monotone games.
Our algorithm achieves both (i) the optimal $O(sqrtT)$ regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal $O(frac1T)$ last-iterate convergence rate to a Nash equilibrium.
arXiv Detail & Related papers (2023-01-30T17:55:53Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with Bandit Feedback [29.553652241608997]
We study the class of textitsmooth and strongly monotone games and study optimal no-regret learning therein.
We first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $tildeTheta(nsqrtT)$.
Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning.
arXiv Detail & Related papers (2021-12-06T08:27:54Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z) - Randomized Exploration for Reinforcement Learning with General Value
Function Approximation [122.70803181751135]
We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
arXiv Detail & Related papers (2021-06-15T02:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.