Evolutionary Dynamics and $\Phi$-Regret Minimization in Games
- URL: http://arxiv.org/abs/2106.14668v1
- Date: Mon, 28 Jun 2021 12:48:15 GMT
- Title: Evolutionary Dynamics and $\Phi$-Regret Minimization in Games
- Authors: Georgios Piliouras, Mark Rowland, Shayegan Omidshafiei, Romuald Elie,
Daniel Hennes, Jerome Connor, Karl Tuyls
- Abstract summary: Regret has been established as a foundational concept in online learning, and likewise has important applications in the analysis of learning dynamics in games.
In this paper, we revisit our understanding of regret from the perspective of deviations over partitions of the full emphmixed strategy space.
We prove here that the well-studied evolutionary learning algorithm of replicator dynamics (RD) seamlessly minimizes the strongest possible form of $Phi$-regret in generic $2 times 2$ games.
- Score: 38.00008966802513
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Regret has been established as a foundational concept in online learning, and
likewise has important applications in the analysis of learning dynamics in
games. Regret quantifies the difference between a learner's performance against
a baseline in hindsight. It is well-known that regret-minimizing algorithms
converge to certain classes of equilibria in games; however, traditional forms
of regret used in game theory predominantly consider baselines that permit
deviations to deterministic actions or strategies. In this paper, we revisit
our understanding of regret from the perspective of deviations over partitions
of the full \emph{mixed} strategy space (i.e., probability distributions over
pure strategies), under the lens of the previously-established $\Phi$-regret
framework, which provides a continuum of stronger regret measures. Importantly,
$\Phi$-regret enables learning agents to consider deviations from and to mixed
strategies, generalizing several existing notions of regret such as external,
internal, and swap regret, and thus broadening the insights gained from
regret-based analysis of learning algorithms. We prove here that the
well-studied evolutionary learning algorithm of replicator dynamics (RD)
seamlessly minimizes the strongest possible form of $\Phi$-regret in generic $2
\times 2$ games, without any modification of the underlying algorithm itself.
We subsequently conduct experiments validating our theoretical results in a
suite of 144 $2 \times 2$ games wherein RD exhibits a diverse set of behaviors.
We conclude by providing empirical evidence of $\Phi$-regret minimization by RD
in some larger games, hinting at further opportunity for $\Phi$-regret based
study of such algorithms from both a theoretical and empirical perspective.
Related papers
- Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
We show that when a pure strategy Nash equilibrium exists, $c$ becomes zero, leading to an optimal instance-dependent regret bound.
Our algorithm also enjoys last-iterate convergence and can identify the pure strategy Nash equilibrium with near-optimal sample.
arXiv Detail & Related papers (2025-02-24T20:20:06Z) - Achieving Better Regret against Strategic Adversaries [15.51709428653595]
We study online learning problems in which the learner has extra knowledge about the adversary's behaviour.
We propose two new online learning algorithms, Accurate Follow the Regularized Leader (AFTRL) and Prod-Best Response (Prod-BR)
AFTRL achieves $O(1)$ external regret or $O(1)$ emphforward regret against no-external regret adversary.
arXiv Detail & Related papers (2023-02-13T19:34:36Z) - Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets.
Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space.
Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z) - The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret
and Policy Switches [84.54669549718075]
We study the problem of regret minimization for episodic Reinforcement Learning (RL)
We focus on learning with general function classes and general model classes.
We show that a logarithmic regret bound is realizable by algorithms with $O(log T)$ switching cost.
arXiv Detail & Related papers (2022-03-03T02:55:55Z) - Fast Rates for Nonparametric Online Learning: From Realizability to
Learning in Games [36.969021834291745]
We propose a proper learning algorithm which gets a near-optimal mistake bound in terms of the sequential fat-shattering dimension of the hypothesis class.
This result answers a question as to whether proper learners could achieve near-optimal mistake bounds.
For the real-valued (regression) setting, the optimal mistake bound was not even known for improper learners.
arXiv Detail & Related papers (2021-11-17T05:24:21Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.