From External to Swap Regret 2.0: An Efficient Reduction and Oblivious
Adversary for Large Action Spaces
- URL: http://arxiv.org/abs/2310.19786v3
- Date: Wed, 6 Dec 2023 07:34:24 GMT
- Title: From External to Swap Regret 2.0: An Efficient Reduction and Oblivious
Adversary for Large Action Spaces
- Authors: Yuval Dagan and Constantinos Daskalakis and Maxwell Fishelson and Noah
Golowich
- Abstract summary: We show that, whenever there exists a no-external-regret algorithm for some hypothesis class, there must also exist a no-swap-regret algorithm for that same class.
Our reduction implies that, if no-regret learning is possible in some game, then this game must have approximate correlated equilibria.
- Score: 29.29647282754262
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We provide a novel reduction from swap-regret minimization to external-regret
minimization, which improves upon the classical reductions of Blum-Mansour
[BM07] and Stolz-Lugosi [SL05] in that it does not require finiteness of the
space of actions. We show that, whenever there exists a no-external-regret
algorithm for some hypothesis class, there must also exist a no-swap-regret
algorithm for that same class. For the problem of learning with expert advice,
our result implies that it is possible to guarantee that the swap regret is
bounded by {\epsilon} after $\log(N)^{O(1/\epsilon)}$ rounds and with $O(N)$
per iteration complexity, where $N$ is the number of experts, while the
classical reductions of Blum-Mansour and Stolz-Lugosi require $O(N/\epsilon^2)$
rounds and at least $\Omega(N^2)$ per iteration complexity. Our result comes
with an associated lower bound, which -- in contrast to that in [BM07] -- holds
for oblivious and $\ell_1$-constrained adversaries and learners that can employ
distributions over experts, showing that the number of rounds must be
$\tilde\Omega(N/\epsilon^2)$ or exponential in $1/\epsilon$.
Our reduction implies that, if no-regret learning is possible in some game,
then this game must have approximate correlated equilibria, of arbitrarily good
approximation. This strengthens the folklore implication of no-regret learning
that approximate coarse correlated equilibria exist. Importantly, it provides a
sufficient condition for the existence of correlated equilibrium which vastly
extends the requirement that the action set is finite, thus answering a
question left open by [DG22; Ass+23]. Moreover, it answers several outstanding
questions about equilibrium computation and learning in games.
Related papers
- Computational Lower Bounds for Regret Minimization in Normal-Form Games [68.66209476382213]
We provide evidence that existing learning algorithms, such as multiplicative weights update, are close to optimal.
Our results are obtained in the algorithmic framework put forward by Kothari and Mehta.
arXiv Detail & Related papers (2024-11-04T00:39:52Z) - Barriers to Welfare Maximization with No-Regret Learning [68.66209476382213]
We prove lower bounds for computing a near-optimal $T$-sparse CCE.
In particular, we show that the inapproximability of maximum clique precludes attaining any non-trivial sparsity in time.
arXiv Detail & Related papers (2024-11-04T00:34:56Z) - Rate-Preserving Reductions for Blackwell Approachability [72.03309261614991]
Abernethy et al. (2011) showed that Blackwell approachability and no-regret learning are equivalent.
We show that it is possible to tightly reduce any approachability instance to an instance of a generalized form of regret minimization.
arXiv Detail & Related papers (2024-06-10T23:23:52Z) - Near-Optimal Reinforcement Learning with Self-Play under Adaptivity
Constraints [21.412085244757336]
We study the problem of multi-agent reinforcement learning (MARL) with adaptivity constraints.
For two-player zero-sum Markov Games, we design a (policy) elimination based algorithm that achieves a regret of $widetildeO(sqrtH3 S2 ABK)$.
We prove a batch complexity lower bound $Omega(fracHlog_AK+loglog K)$ for all algorithms with $widetildeO(sqrtK)$
arXiv Detail & Related papers (2024-02-02T03:00:40Z) - Fast swap regret minimization and applications to approximate correlated
equilibria [20.047624953965965]
For any constant $varepsilon>0$, our algorithm obtains $varepsilon T$-swap regret within only $T = mathsfpolylog(n)$ rounds.
Our algorithm has an exponential dependence on $varepsilon$, but we prove a new, matching lower bound.
arXiv Detail & Related papers (2023-10-30T15:35:24Z) - Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games.
This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
arXiv Detail & Related papers (2022-02-03T18:18:28Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z) - No Discounted-Regret Learning in Adversarial Bandits with Delays [40.670563413892154]
We show that the expected discounted ergodic distribution of play converges to the set of coarse correlated equilibrium (CCE) if the algorithms have "no discounted-regret"
For a zero-sum game, we show that no discounted-regret is sufficient for the discounted ergodic average of play to converge to the set of Nash equilibria.
arXiv Detail & Related papers (2021-03-08T05:15:31Z) - Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits
with Linear Payoff Functions [53.77572276969548]
We show that the C$2$UCB algorithm has the optimal regret bound $tildeO(dsqrtkT + dk)$ for the partition matroid constraints.
For general constraints, we propose an algorithm that modifies the reward estimates of arms in the C$2$UCB algorithm.
arXiv Detail & Related papers (2021-01-20T04:29:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.