Tight last-iterate convergence rates for no-regret learning in
  multi-player games
        - URL: http://arxiv.org/abs/2010.13724v1
- Date: Mon, 26 Oct 2020 17:06:19 GMT
- Title: Tight last-iterate convergence rates for no-regret learning in
  multi-player games
- Authors: Noah Golowich, Sarath Pattathil, Constantinos Daskalakis
- Abstract summary: We show that the optimistic gradient (OG) algorithm with a constant step-size, which is no-regret, achieves a last-iterate rate of $O (1/sqrtT)$ in smooth monotone games.
We also show that the $O (1/sqrtT)$ rate is tight for all $p$-SCLI algorithms, which includes OG as a special case.
- Score: 31.604950465209335
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract:   We study the question of obtaining last-iterate convergence rates for
no-regret learning algorithms in multi-player games. We show that the
optimistic gradient (OG) algorithm with a constant step-size, which is
no-regret, achieves a last-iterate rate of $O(1/\sqrt{T})$ with respect to the
gap function in smooth monotone games. This result addresses a question of
Mertikopoulos & Zhou (2018), who asked whether extra-gradient approaches (such
as OG) can be applied to achieve improved guarantees in the multi-agent
learning setting. The proof of our upper bound uses a new technique centered
around an adaptive choice of potential function at each iteration. We also show
that the $O(1/\sqrt{T})$ rate is tight for all $p$-SCLI algorithms, which
includes OG as a special case. As a byproduct of our lower bound analysis we
additionally present a proof of a conjecture of Arjevani et al. (2015) which is
more direct than previous approaches.
 
      
        Related papers
        - Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games   with Bandit Feedback [60.610120215789976]
 We show that when a pure strategy Nash equilibrium exists, $c$ becomes zero, leading to an optimal instance-dependent regret bound.
Our algorithm also enjoys last-iterate convergence and can identify the pure strategy Nash equilibrium with near-optimal sample.
 arXiv  Detail & Related papers  (2025-02-24T20:20:06Z)
- On the SAGA algorithm with decreasing step [0.0]
 We introduce a new $lambda$-SAGA algorithm which interpolates between the Gradient Descent and SAGA algorithms.
We establish a central limit theorem for the $lambda$-SAGA algorithm.
We provide the non-asymptotic $mathbbLp$ rates of convergence.
 arXiv  Detail & Related papers  (2024-10-02T12:04:36Z)
- Nearly Minimax Optimal Regret for Learning Linear Mixture Stochastic
  Shortest Path [80.60592344361073]
 We study the Shortest Path (SSP) problem with a linear mixture transition kernel.
An agent repeatedly interacts with a environment and seeks to reach certain goal state while minimizing the cumulative cost.
Existing works often assume a strictly positive lower bound of the iteration cost function or an upper bound of the expected length for the optimal policy.
 arXiv  Detail & Related papers  (2024-02-14T07:52:00Z)
- Last-Iterate Convergence Properties of Regret-Matching Algorithms in
  Games [72.43065138581589]
 We study the last-iterate convergence properties of various popular variants of RM$+$.
We show numerically that several practical variants such as simultaneous RM$+$, alternating RM$+$, and predictive RM$+$, all lack last-iterate convergence guarantees.
We then prove that recent variants of these algorithms based on a smoothing technique do enjoy last-iterate convergence.
 arXiv  Detail & Related papers  (2023-11-01T17:34:58Z)
- Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and   Exp-Concave Games with Gradient Feedback [75.29048190099523]
 Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions.
In this paper, we design a fully adaptive OGD algorithm, textsfAdaOGD, that does not require a priori knowledge of these parameters.
 arXiv  Detail & Related papers  (2023-10-21T18:38:13Z)
- Horizon-free Reinforcement Learning in Adversarial Linear Mixture MDPs [72.40181882916089]
 We show that our algorithm achieves an $tildeObig((d+log (|mathcalS|2 |mathcalA|))sqrtKbig)$ regret with full-information feedback, where $d$ is the dimension of a known feature mapping linearly parametrizing the unknown transition kernel of the MDP, $K$ is the number of episodes, $|mathcalS|$ and $|mathcalA|$ are the cardinalities of the state and action spaces
 arXiv  Detail & Related papers  (2023-05-15T05:37:32Z)
- Doubly Optimal No-Regret Learning in Monotone Games [10.760195409078591]
 We propose the first doubly optimal no-regret learning algorithm for smooth monotone games.
Our algorithm achieves both (i) the optimal $O(sqrtT)$ regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal $O(frac1T)$ last-iterate convergence rate to a Nash equilibrium.
 arXiv  Detail & Related papers  (2023-01-30T17:55:53Z)
- Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
 We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
 arXiv  Detail & Related papers  (2022-06-17T12:58:58Z)
- Doubly Optimal No-Regret Online Learning in Strongly Monotone Games with   Bandit Feedback [29.553652241608997]
 We study the class of textitsmooth and strongly monotone games and study optimal no-regret learning therein.
We first construct a new bandit learning algorithm and show that it achieves the single-agent optimal regret of $tildeTheta(nsqrtT)$.
Our results thus settle this open problem and contribute to the broad landscape of bandit game-theoretical learning.
 arXiv  Detail & Related papers  (2021-12-06T08:27:54Z)
- Near-Optimal No-Regret Learning for Correlated Equilibria in
  Multi-Player General-Sum Games [104.74734408204749]
 We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
 arXiv  Detail & Related papers  (2021-11-11T01:19:53Z)
- Randomized Exploration for Reinforcement Learning with General Value
  Function Approximation [122.70803181751135]
 We propose a model-free reinforcement learning algorithm inspired by the popular randomized least squares value iteration (RLSVI) algorithm.
Our algorithm drives exploration by simply perturbing the training data with judiciously chosen i.i.d. scalar noises.
We complement the theory with an empirical evaluation across known difficult exploration tasks.
 arXiv  Detail & Related papers  (2021-06-15T02:23:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
       
     
           This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.