Uncoupled Bandit Learning towards Rationalizability: Benchmarks,
Barriers, and Algorithms
- URL: http://arxiv.org/abs/2111.05486v3
- Date: Sat, 23 Dec 2023 07:00:40 GMT
- Title: Uncoupled Bandit Learning towards Rationalizability: Benchmarks,
Barriers, and Algorithms
- Authors: Jibang Wu, Haifeng Xu, Fan Yao
- Abstract summary: We study the last-iterate convergence guarantee in general games toward rationalizability.
This learning task naturally generalizes best arm identification problems.
We develop a new algorithm that adjusts Exp3 with Diminishing Historical rewards.
- Score: 41.307340085194625
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Under the uncoupled learning setup, the last-iterate convergence guarantee
towards Nash equilibrium is shown to be impossible in many games. This work
studies the last-iterate convergence guarantee in general games toward
rationalizability, a key solution concept in epistemic game theory that relaxes
the stringent belief assumptions in both Nash and correlated equilibrium. This
learning task naturally generalizes best arm identification problems, due to
the intrinsic connections between rationalizable action profiles and the
elimination of iteratively dominated actions. Despite a seemingly simple task,
our first main result is a surprisingly negative one; that is, a large and
natural class of no regret algorithms, including the entire family of Dual
Averaging algorithms, provably take exponentially many rounds to reach
rationalizability. Moreover, algorithms with the stronger no swap regret also
suffer similar exponential inefficiency. To overcome these barriers, we develop
a new algorithm that adjusts Exp3 with Diminishing Historical rewards (termed
Exp3-DH); Exp3-DH gradually forgets history at carefully tailored rates. We
prove that when all agents run Exp3-DH (a.k.a., self-play in multi-agent
learning), all iteratively dominated actions can be eliminated within
polynomially many rounds. Our experimental results further demonstrate the
efficiency of Exp3-DH, and that state-of-the-art bandit algorithms, even those
developed specifically for learning in games, fail to reach rationalizability
efficiently.
Related papers
- Hardness of Independent Learning and Sparse Equilibrium Computation in
Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games.
We show that no algorithm attains no-regret in general-sum games when executed independently by all players.
We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z) - Global Nash Equilibrium in Non-convex Multi-player Game: Theory and
Algorithms [66.8634598612777]
We show that Nash equilibrium (NE) is acceptable to all players in a multi-player game.
We also show that no one can benefit unilaterally from the general theory step by step.
arXiv Detail & Related papers (2023-01-19T11:36:50Z) - Learning Rationalizable Equilibria in Multiplayer Games [38.922957434291554]
Existing algorithms require a number of samples exponential in the number of players to learn rationalizable equilibria under bandit feedback.
This paper develops the first line of efficient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE)
Our algorithms incorporate several novel techniques to guarantee rationalizability and no (swap-)regret simultaneously, including a correlated exploration scheme and adaptive learning rates.
arXiv Detail & Related papers (2022-10-20T16:49:00Z) - Differentiable Bilevel Programming for Stackelberg Congestion Games [47.60156422249365]
In a Stackelberg congestion game (SCG), a leader aims to maximize their own gain by anticipating and manipulating the equilibrium state at which the followers settle by playing a congestion game.
Here, we attempt to tackle this computational challenge by marrying traditional methodologies with the latest differentiable programming techniques in machine learning.
We propose two new local search algorithms for SCGs. The first is a gradient descent algorithm that obtains the derivatives by unrolling ILD via differentiable programming.
The second algorithm adds a twist by cutting short the followers' evolution trajectory.
arXiv Detail & Related papers (2022-09-15T21:32:23Z) - Learning Two-Player Mixture Markov Games: Kernel Function Approximation
and Correlated Equilibrium [157.0902680672422]
We consider learning Nash equilibria in two-player zero-sum Markov Games with nonlinear function approximation.
We propose a novel online learning algorithm to find a Nash equilibrium by minimizing the duality gap.
arXiv Detail & Related papers (2022-08-10T14:21:54Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Last-iterate Convergence in Extensive-Form Games [49.31256241275577]
We study last-iterate convergence of optimistic algorithms in sequential games.
We show that all of these algorithms enjoy last-iterate convergence, with some of them even converging exponentially fast.
arXiv Detail & Related papers (2021-06-27T22:02:26Z) - Hindsight and Sequential Rationality of Correlated Play [18.176128899338433]
We look at algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior.
We develop and advocate for this hindsight framing of learning in general sequential decision-making settings.
We present examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature.
arXiv Detail & Related papers (2020-12-10T18:30:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.