Near-Optimal No-Regret Learning for General Convex Games
- URL: http://arxiv.org/abs/2206.08742v2
- Date: Mon, 20 Jun 2022 06:58:54 GMT
- Title: Near-Optimal No-Regret Learning for General Convex Games
- Authors: Gabriele Farina, Ioannis Anagnostides, Haipeng Luo, Chung-Wei Lee,
Christian Kroer, Tuomas Sandholm
- Abstract summary: We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
- Score: 121.50979258049135
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A recent line of work has established uncoupled learning dynamics such that,
when employed by all players in a game, each player's \emph{regret} after $T$
repetitions grows polylogarithmically in $T$, an exponential improvement over
the traditional guarantees within the no-regret framework. However, so far
these results have only been limited to certain classes of games with
structured strategy spaces -- such as normal-form and extensive-form games. The
question as to whether $O(\text{polylog} T)$ regret bounds can be obtained for
general convex and compact strategy sets -- which occur in many fundamental
models in economics and multiagent systems -- while retaining efficient
strategy updates is an important question. In this paper, we answer this in the
positive by establishing the first uncoupled learning algorithm with $O(\log
T)$ per-player regret in general \emph{convex games}, that is, games with
concave utility functions supported on arbitrary convex and compact strategy
sets. Our learning dynamics are based on an instantiation of optimistic
follow-the-regularized-leader over an appropriately \emph{lifted} space using a
\emph{self-concordant regularizer} that is, peculiarly, not a barrier for the
feasible region. Further, our learning dynamics are efficiently implementable
given access to a proximal oracle for the convex strategy set, leading to
$O(\log\log T)$ per-iteration complexity; we also give extensions when access
to only a \emph{linear} optimization oracle is assumed. Finally, we adapt our
dynamics to guarantee $O(\sqrt{T})$ regret in the adversarial regime. Even in
those special cases where prior results apply, our algorithm improves over the
state-of-the-art regret bounds either in terms of the dependence on the number
of iterations or on the dimension of the strategy sets.
Related papers
- Adversarial Contextual Bandits Go Kernelized [21.007410990554522]
We study a generalization of the problem of online learning in adversarial linear contextual bandits by incorporating loss functions that belong to a Hilbert kernel space.
We propose a new optimistically biased estimator for the loss functions and reproducing near-optimal regret guarantees.
arXiv Detail & Related papers (2023-10-02T19:59:39Z) - Near-Optimal $\Phi$-Regret Learning in Extensive-Form Games [85.78272987312343]
We establish efficient and uncoupled learning dynamics so that the trigger regret of each player grows as $O(log T)$ after $T$ repetitions of play.
This improves exponentially over the prior best known trigger-regret bound of $O(T1/4)$.
arXiv Detail & Related papers (2022-08-20T20:48:58Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer
Games [121.50979258049135]
We show that when all players follow our dynamics with a emphtime-invariant learning rate, the emphsecond-order path lengths of the dynamics up to time $T$ are bounded by $O(log T)$.
Our proposed learning dynamics combine in a novel way emphoptimistic regularized learning with the use of emphself-concordant barriers.
arXiv Detail & Related papers (2022-04-25T03:20:16Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z) - Making Non-Stochastic Control (Almost) as Easy as Stochastic [27.736345095024276]
We show that same regret rate is attainable even in considerably more general non-stochastic control model.
We attain the optimal $widetildemathcalO(sqrtT)$ regret when the dynamics are unknown to the learner.
arXiv Detail & Related papers (2020-06-10T16:00:14Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.