Barriers to Welfare Maximization with No-Regret Learning
- URL: http://arxiv.org/abs/2411.01720v1
- Date: Mon, 04 Nov 2024 00:34:56 GMT
- Title: Barriers to Welfare Maximization with No-Regret Learning
- Authors: Ioannis Anagnostides, Alkis Kalavasis, Tuomas Sandholm,
- Abstract summary: We prove lower bounds for computing a near-optimal $T$-sparse CCE.
In particular, we show that the inapproximability of maximum clique precludes attaining any non-trivial sparsity in time.
- Score: 68.66209476382213
- License:
- Abstract: A celebrated result in the interface of online learning and game theory guarantees that the repeated interaction of no-regret players leads to a coarse correlated equilibrium (CCE) -- a natural game-theoretic solution concept. Despite the rich history of this foundational problem and the tremendous interest it has received in recent years, a basic question still remains open: how many iterations are needed for no-regret players to approximate an equilibrium? In this paper, we establish the first computational lower bounds for that problem in two-player (general-sum) games under the constraint that the CCE reached approximates the optimal social welfare (or some other natural objective). From a technical standpoint, our approach revolves around proving lower bounds for computing a near-optimal $T$-sparse CCE -- a mixture of $T$ product distributions, thereby circumscribing the iteration complexity of no-regret learning even in the centralized model of computation. Our proof proceeds by extending a classical reduction of Gilboa and Zemel [1989] for optimal Nash to sparse (approximate) CCE. In particular, we show that the inapproximability of maximum clique precludes attaining any non-trivial sparsity in polynomial time. Moreover, we strengthen our hardness results to apply in the low-precision regime as well via the planted clique conjecture.
Related papers
- Computational Lower Bounds for Regret Minimization in Normal-Form Games [68.66209476382213]
We provide evidence that existing learning algorithms, such as multiplicative weights update, are close to optimal.
Our results are obtained in the algorithmic framework put forward by Kothari and Mehta.
arXiv Detail & Related papers (2024-11-04T00:39:52Z) - Optimistic Policy Gradient in Multi-Player Markov Games with a Single
Controller: Convergence Beyond the Minty Property [89.96815099996132]
We develop a new framework to characterize optimistic policy gradient methods in multi-player games with a single controller.
Our approach relies on a natural generalization of the classical Minty property that we introduce, which we anticipate to have further applications beyond Markov games.
arXiv Detail & Related papers (2023-12-19T11:34:10Z) - From External to Swap Regret 2.0: An Efficient Reduction and Oblivious
Adversary for Large Action Spaces [29.29647282754262]
We show that, whenever there exists a no-external-regret algorithm for some hypothesis class, there must also exist a no-swap-regret algorithm for that same class.
Our reduction implies that, if no-regret learning is possible in some game, then this game must have approximate correlated equilibria.
arXiv Detail & Related papers (2023-10-30T17:50:29Z) - Near-Optimal No-Regret Learning for Correlated Equilibria in
Multi-Player General-Sum Games [104.74734408204749]
We show that if all agents in a multi-player general-sum normal-form game employ Optimistic Multiplicative Weights Update (OMWU), the external regret of every player is $O(textrmpolylog(T))$ after $T$ repetitions of the game.
We extend their result from external regret to internal regret and swap regret, thereby establishing uncoupled learning dynamics that converge to an approximate correlated equilibrium.
arXiv Detail & Related papers (2021-11-11T01:19:53Z) - Model-Free Online Learning in Unknown Sequential Decision Making
Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium.
We formalize an online learning setting in which the strategy space is not known to the agent.
We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z) - Learning Zero-Sum Simultaneous-Move Markov Games Using Function
Approximation and Correlated Equilibrium [116.56359444619441]
We develop provably efficient reinforcement learning algorithms for two-player zero-sum finite-horizon Markov games.
In the offline setting, we control both players and aim to find the Nash Equilibrium by minimizing the duality gap.
In the online setting, we control a single player playing against an arbitrary opponent and aim to minimize the regret.
arXiv Detail & Related papers (2020-02-17T17:04:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.