Related papers: Corrupted Learning Dynamics in Games

Corrupted Learning Dynamics in Games

URL: http://arxiv.org/abs/2412.07120v2
Date: Sun, 16 Feb 2025 19:18:39 GMT
Title: Corrupted Learning Dynamics in Games
Authors: Taira Tsuchiya, Shinji Ito, Haipeng Luo,
Abstract summary: An equilibrium can be computed at a fast rate of $O(log T)$ when all players follow the optimistic follow-the-regularized-leader (OFTRL)<n>We present corrupted learning dynamics that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm.
Score: 62.73758165845971
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Learning in games refers to scenarios where multiple players interact in a shared environment, each aiming to minimize their regret. An equilibrium can be computed at a fast rate of $O(1/T)$ when all players follow the optimistic follow-the-regularized-leader (OFTRL). However, this acceleration is limited to the honest regime, in which all players adhere to a prescribed algorithm -- a situation that may not be realistic in practice. To address this issue, we present corrupted learning dynamics that adaptively find an equilibrium at a rate that depends on the extent to which each player deviates from the strategy suggested by the prescribed algorithm. First, in two-player zero-sum corrupted games, we provide learning dynamics for which the external regret of $x$-player (and similarly for $y$-player) is roughly bounded by $O(\log (m_x m_y) + \sqrt{\hat{C}_y} + \hat{C}_x)$, where $m_x$ and $m_y$ denote the number of actions of $x$- and $y$-players, respectively, and $\hat{C}_x$ and $\hat{C}_y$ represent their cumulative deviations. We then extend our approach to multi-player general-sum corrupted games, providing learning dynamics for which the swap regret of player $i$ is bounded by $O(\log T + \sqrt{\sum_{k} \hat{C}_k \log T} + \hat{C}_i)$ ignoring dependence on the number of players and actions, where $\hat{C}_i$ is the cumulative deviation of player $i$ from the prescribed algorithm. Our learning dynamics are agnostic to the levels of corruption. A key technical contribution is a new analysis that ensures the stability of a Markov chain under a new adaptive learning rate, thereby allowing us to achieve the desired bound in the corrupted regime while matching the best existing bound in the honest regime. Notably, our framework can be extended to address not only corruption in strategies but also corruption in the observed expected utilities, and we provide several matching lower bounds.

Related papers

Instance-Dependent Regret Bounds for Learning Two-Player Zero-Sum Games with Bandit Feedback [60.610120215789976]
We show that when a pure strategy Nash equilibrium exists, $c$ becomes zero, leading to an optimal instance-dependent regret bound. Our algorithm also enjoys last-iterate convergence and can identify the pure strategy Nash equilibrium with near-optimal sample.
arXiv Detail & Related papers (2025-02-24T20:20:06Z)
Fast Last-Iterate Convergence of Learning in Games Requires Forgetful Algorithms [71.73971094342349]
Self-play via online learning is one of the premier ways to solve large-scale two-player zero-sum games. We show that OMWU offers several advantages including logarithmic dependence on the size of the payoff matrix. We prove that a broad class of algorithms that do not forget the past quickly all suffer the same issue.
arXiv Detail & Related papers (2024-06-15T13:26:17Z)
The complexity of approximate (coarse) correlated equilibrium for incomplete information games [16.96984593866157]
We study the complexity of decentralized learning of approximate correlated equilibria in incomplete information games. Our lower bound holds even for the easier solution concept of $epsilon$-approximate $mathitco$ correlated equilibrium.
arXiv Detail & Related papers (2024-06-04T14:35:27Z)
$\widetilde{O}(T^{-1})$ Convergence to (Coarse) Correlated Equilibria in Full-Information General-Sum Markov Games [8.215874655947335]
We show that an optimistic-follow-the-regularized-leader algorithm can find $widetildeO(T-1)$-approximate iterations in full-information general-sum Markov games within $T$.
arXiv Detail & Related papers (2024-02-02T20:40:27Z)
Near-Optimal No-Regret Learning for General Convex Games [121.50979258049135]
We show that regret can be obtained for general convex and compact strategy sets. Our dynamics are on an instantiation of optimistic follow-the-regularized-bounds over an appropriately emphlifted space. Even in those special cases where prior results apply, our algorithm improves over the state-of-the-art regret.
arXiv Detail & Related papers (2022-06-17T12:58:58Z)
Uncoupled Learning Dynamics with $O(\log T)$ Swap Regret in Multiplayer Games [121.50979258049135]
We show that when all players follow our dynamics with a emphtime-invariant learning rate, the emphsecond-order path lengths of the dynamics up to time $T$ are bounded by $O(log T)$. Our proposed learning dynamics combine in a novel way emphoptimistic regularized learning with the use of emphself-concordant barriers.
arXiv Detail & Related papers (2022-04-25T03:20:16Z)
Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games. This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
arXiv Detail & Related papers (2022-02-03T18:18:28Z)
When Can We Learn General-Sum Markov Games with a Large Number of Players Sample-Efficiently? [10.397170312149067]
This paper investigates what learning goals admit better sample complexities in the setting of $m$-player general-sum Markov games. First, we design algorithms for learning an $epsilon$-Coarse Correlated Equilibrium (CCE) in $widetildemathcalO(H5Smax_ile m A_i / epsilon2)$ episodes. Second, we consider the important special case of Markov Potential Games, and design an algorithm that learns an $epsilon$-approximate Nash equilibrium within $widet
arXiv Detail & Related papers (2021-10-08T15:06:22Z)
Near-Optimal Reinforcement Learning with Self-Play [50.29853537456737]
We focus on self-play algorithms which learn the optimal policy by playing against itself without any direct supervision. We propose an optimistic variant of the emphNash Q-learning algorithm with sample complexity $tildemathcalO(SAB)$, and a new emphNash V-learning algorithm with sample complexity $tildemathcalO(S(A+B))$.
arXiv Detail & Related papers (2020-06-22T05:00:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.