Related papers: Uncoupled Bandit Learning towards Rationalizability: Benchmarks, Barriers, and Algorithms

Uncoupled Bandit Learning towards Rationalizability: Benchmarks, Barriers, and Algorithms

URL: http://arxiv.org/abs/2111.05486v3
Date: Sat, 23 Dec 2023 07:00:40 GMT
Title: Uncoupled Bandit Learning towards Rationalizability: Benchmarks, Barriers, and Algorithms
Authors: Jibang Wu, Haifeng Xu, Fan Yao
Abstract summary: We study the last-iterate convergence guarantee in general games toward rationalizability. This learning task naturally generalizes best arm identification problems. We develop a new algorithm that adjusts Exp3 with Diminishing Historical rewards.
Score: 41.307340085194625
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Under the uncoupled learning setup, the last-iterate convergence guarantee towards Nash equilibrium is shown to be impossible in many games. This work studies the last-iterate convergence guarantee in general games toward rationalizability, a key solution concept in epistemic game theory that relaxes the stringent belief assumptions in both Nash and correlated equilibrium. This learning task naturally generalizes best arm identification problems, due to the intrinsic connections between rationalizable action profiles and the elimination of iteratively dominated actions. Despite a seemingly simple task, our first main result is a surprisingly negative one; that is, a large and natural class of no regret algorithms, including the entire family of Dual Averaging algorithms, provably take exponentially many rounds to reach rationalizability. Moreover, algorithms with the stronger no swap regret also suffer similar exponential inefficiency. To overcome these barriers, we develop a new algorithm that adjusts Exp3 with Diminishing Historical rewards (termed Exp3-DH); Exp3-DH gradually forgets history at carefully tailored rates. We prove that when all agents run Exp3-DH (a.k.a., self-play in multi-agent learning), all iteratively dominated actions can be eliminated within polynomially many rounds. Our experimental results further demonstrate the efficiency of Exp3-DH, and that state-of-the-art bandit algorithms, even those developed specifically for learning in games, fail to reach rationalizability efficiently.

Related papers

Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games. We show that no algorithm attains no-regret in general-sum games when executed independently by all players. We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z)
Global Nash Equilibrium in Non-convex Multi-player Game: Theory and Algorithms [66.8634598612777]
We show that Nash equilibrium (NE) is acceptable to all players in a multi-player game. We also show that no one can benefit unilaterally from the general theory step by step.
arXiv Detail & Related papers (2023-01-19T11:36:50Z)
Learning Rationalizable Equilibria in Multiplayer Games [38.922957434291554]
Existing algorithms require a number of samples exponential in the number of players to learn rationalizable equilibria under bandit feedback. This paper develops the first line of efficient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE) Our algorithms incorporate several novel techniques to guarantee rationalizability and no (swap-)regret simultaneously, including a correlated exploration scheme and adaptive learning rates.
arXiv Detail & Related papers (2022-10-20T16:49:00Z)
Differentiable Bilevel Programming for Stackelberg Congestion Games [47.60156422249365]
In a Stackelberg congestion game (SCG), a leader aims to maximize their own gain by anticipating and manipulating the equilibrium state at which the followers settle by playing a congestion game. Here, we attempt to tackle this computational challenge by marrying traditional methodologies with the latest differentiable programming techniques in machine learning. We propose two new local search algorithms for SCGs. The first is a gradient descent algorithm that obtains the derivatives by unrolling ILD via differentiable programming. The second algorithm adds a twist by cutting short the followers' evolution trajectory.
arXiv Detail & Related papers (2022-09-15T21:32:23Z)
Learning Two-Player Mixture Markov Games: Kernel Function Approximation and Correlated Equilibrium [157.0902680672422]
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation. We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap.
arXiv Detail & Related papers (2022-08-10T14:21:54Z)
No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning. We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z)
Last-iterate Convergence in Extensive-Form Games [49.31256241275577]
We study last-iterate convergence of optimistic algorithms in sequential games. We show that all of these algorithms enjoy last-iterate convergence, with some of them even converging exponentially fast.
arXiv Detail & Related papers (2021-06-27T22:02:26Z)
Hindsight and Sequential Rationality of Correlated Play [18.176128899338433]
We look at algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. We develop and advocate for this hindsight framing of learning in general sequential decision-making settings. We present examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature.
arXiv Detail & Related papers (2020-12-10T18:30:21Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.