Related papers: Computational Lower Bounds for Regret Minimization in Normal-Form Games

Computational Lower Bounds for Regret Minimization in Normal-Form Games

URL: http://arxiv.org/abs/2411.01721v1
Date: Mon, 04 Nov 2024 00:39:52 GMT
Title: Computational Lower Bounds for Regret Minimization in Normal-Form Games
Authors: Ioannis Anagnostides, Alkis Kalavasis, Tuomas Sandholm,
Abstract summary: We provide evidence that existing learning algorithms, such as multiplicative weights update, are close to optimal. Our results are obtained in the algorithmic framework put forward by Kothari and Mehta.
Score: 68.66209476382213
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A celebrated connection in the interface of online learning and game theory establishes that players minimizing swap regret converge to correlated equilibria (CE) -- a seminal game-theoretic solution concept. Despite the long history of this problem and the renewed interest it has received in recent years, a basic question remains open: how many iterations are needed to approximate an equilibrium under the usual normal-form representation? In this paper, we provide evidence that existing learning algorithms, such as multiplicative weights update, are close to optimal. In particular, we prove lower bounds for the problem of computing a CE that can be expressed as a uniform mixture of $T$ product distributions -- namely, a uniform $T$-sparse CE; such lower bounds immediately circumscribe (computationally bounded) regret minimization algorithms in games. Our results are obtained in the algorithmic framework put forward by Kothari and Mehta (STOC 2018) in the context of computing Nash equilibria, which consists of the sum-of-squares (SoS) relaxation in conjunction with oracle access to a verification oracle; the goal in that framework is to lower bound either the degree of the SoS relaxation or the number of queries to the verification oracle. Here, we obtain two such hardness results, precluding computing i) uniform $\text{log }n$-sparse CE when $\epsilon =\text{poly}(1/\text{log }n)$ and ii) uniform $n^{1 - o(1)}$-sparse CE when $\epsilon = \text{poly}(1/n)$.

Related papers

Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning [69.45910671974296]
We show that a new quantity named the single policy deviation concentrability coefficient is unavoidable in the non-interactive imitation learning setting.<n>We introduce two novel solution algorithms: MAIL-BRO and MURMAIL.<n>The latter bypasses completely the best response oracle at the cost of a worse expert query complexity of order $mathcalO(varepsilon-8)$.
arXiv Detail & Related papers (2025-05-23T08:18:35Z)
Barriers to Welfare Maximization with No-Regret Learning [68.66209476382213]
We prove lower bounds for computing a near-optimal $T$-sparse CCE. In particular, we show that the inapproximability of maximum clique precludes attaining any non-trivial sparsity in time.
arXiv Detail & Related papers (2024-11-04T00:34:56Z)
Fast Rates for Bandit PAC Multiclass Classification [73.17969992976501]
We study multiclass PAC learning with bandit feedback, where inputs are classified into one of $K$ possible labels and feedback is limited to whether or not the predicted labels are correct. Our main contribution is in designing a novel learning algorithm for the agnostic $(varepsilon,delta)$PAC version of the problem.
arXiv Detail & Related papers (2024-06-18T08:54:04Z)
From External to Swap Regret 2.0: An Efficient Reduction and Oblivious Adversary for Large Action Spaces [29.29647282754262]
We show that, whenever there exists a no-external-regret algorithm for some hypothesis class, there must also exist a no-swap-regret algorithm for that same class. Our reduction implies that, if no-regret learning is possible in some game, then this game must have approximate correlated equilibria.
arXiv Detail & Related papers (2023-10-30T17:50:29Z)
Fast swap regret minimization and applications to approximate correlated equilibria [20.047624953965965]
For any constant $varepsilon>0$, our algorithm obtains $varepsilon T$-swap regret within only $T = mathsfpolylog(n)$ rounds. Our algorithm has an exponential dependence on $varepsilon$, but we prove a new, matching lower bound.
arXiv Detail & Related papers (2023-10-30T15:35:24Z)
Learning Correlated Equilibria in Mean-Field Games [62.14589406821103]
We develop the concepts of Mean-Field correlated and coarse-correlated equilibria. We show that they can be efficiently learnt in emphall games, without requiring any additional assumption on the structure of the game.
arXiv Detail & Related papers (2022-08-22T08:31:46Z)
Policy Optimization for Markov Games: Unified Framework and Faster Convergence [81.3266426402464]
We show that the state-wise average policy of this algorithm converges to an approximate Nash equilibrium (NE) of the game. We extend this algorithm to multi-player general Markov Games and show a $mathcalwidetildeO(T-1/2)$ convergence rate to Correlated Equilibria (CCE)
arXiv Detail & Related papers (2022-06-06T14:23:13Z)
Solving optimization problems with Blackwell approachability [31.29341288414507]
Conic Blackwell Algorithm$+$ (CBA$+$) regret minimizer. CBA$+$ is based on Blackwell approachability and attains $O(sqrtT)$ regret. Based on CBA$+$, we introduce SP-CBA$+$, a new parameter-free algorithm for solving convex-concave saddle-point problems.
arXiv Detail & Related papers (2022-02-24T18:19:21Z)
Near-Optimal Learning of Extensive-Form Games with Imperfect Information [54.55092907312749]
We present the first line of algorithms that require only $widetildemathcalO((XA+YB)/varepsilon2)$ episodes of play to find an $varepsilon$-approximate Nash equilibrium in two-player zero-sum games. This improves upon the best known sample complexity of $widetildemathcalO((X2A+Y2B)/varepsilon2)$ by a factor of $widetildemathcalO(maxX,
arXiv Detail & Related papers (2022-02-03T18:18:28Z)
Escaping Saddle-Points Faster under Interpolation-like Conditions [19.9471360853892]
We show that under over-parametrization several standard optimization algorithms escape saddle-points and converge to local-minimizers much faster. We discuss the first-order oracle complexity of Perturbed Gradient Descent (PSGD) algorithm to reach an $epsilon$ localminimizer. We next analyze Cubic-Regularized Newton (SCRN) algorithm under-like conditions, and show that the oracle complexity to reach an $epsilon$ local-minimizer under-like conditions, is $tildemathcalO (1/epsilon2.5
arXiv Detail & Related papers (2020-09-28T02:15:18Z)

This list is automatically generated from the titles and abstracts of the papers in this site.