Hindsight and Sequential Rationality of Correlated Play
- URL: http://arxiv.org/abs/2012.05874v2
- Date: Thu, 17 Dec 2020 01:13:53 GMT
- Title: Hindsight and Sequential Rationality of Correlated Play
- Authors: Dustin Morrill, Ryan D'Orazio, Reca Sarfati, Marc Lanctot, James R.
Wright, Amy Greenwald, Michael Bowling
- Abstract summary: We look at algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior.
We develop and advocate for this hindsight framing of learning in general sequential decision-making settings.
We present examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature.
- Score: 18.176128899338433
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Driven by recent successes in two-player, zero-sum game solving and playing,
artificial intelligence work on games has increasingly focused on algorithms
that produce equilibrium-based strategies. However, this approach has been less
effective at producing competent players in general-sum games or those with
more than two players than in two-player, zero-sum games. An appealing
alternative is to consider adaptive algorithms that ensure strong performance
in hindsight relative to what could have been achieved with modified behavior.
This approach also leads to a game-theoretic analysis, but in the correlated
play that arises from joint learning dynamics rather than factored agent
behavior at equilibrium. We develop and advocate for this hindsight rationality
framing of learning in general sequential decision-making settings. To this
end, we re-examine mediated equilibrium and deviation types in extensive-form
games, thereby gaining a more complete understanding and resolving past
misconceptions. We present a set of examples illustrating the distinct
strengths and weaknesses of each type of equilibrium in the literature, and
prove that no tractable concept subsumes all others. This line of inquiry
culminates in the definition of the deviation and equilibrium classes that
correspond to algorithms in the counterfactual regret minimization (CFR)
family, relating them to all others in the literature. Examining CFR in greater
detail further leads to a new recursive definition of rationality in correlated
play that extends sequential rationality in a way that naturally applies to
hindsight evaluation.
Related papers
- Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game.
Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z) - On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games.
Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games.
We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - How Bad is Selfish Driving? Bounding the Inefficiency of Equilibria in
Urban Driving Games [64.71476526716668]
We study the (in)efficiency of any equilibrium players might agree to play.
We obtain guarantees that refine existing bounds on the Price of Anarchy.
Although the obtained guarantees concern open-loop trajectories, we observe efficient equilibria even when agents employ closed-loop policies.
arXiv Detail & Related papers (2022-10-24T09:32:40Z) - Learning Rationalizable Equilibria in Multiplayer Games [38.922957434291554]
Existing algorithms require a number of samples exponential in the number of players to learn rationalizable equilibria under bandit feedback.
This paper develops the first line of efficient algorithms for learning rationalizable Coarse Correlated Equilibria (CCE) and Correlated Equilibria (CE)
Our algorithms incorporate several novel techniques to guarantee rationalizability and no (swap-)regret simultaneously, including a correlated exploration scheme and adaptive learning rates.
arXiv Detail & Related papers (2022-10-20T16:49:00Z) - Learning Correlated Equilibria in Mean-Field Games [62.14589406821103]
We develop the concepts of Mean-Field correlated and coarse-correlated equilibria.
We show that they can be efficiently learnt in emphall games, without requiring any additional assumption on the structure of the game.
arXiv Detail & Related papers (2022-08-22T08:31:46Z) - Multiplayer Performative Prediction: Learning in Decision-Dependent
Games [18.386569111954213]
This paper formulates a new game theoretic framework for multi-player performative prediction.
We focus on two distinct solution concepts, namely (i) performatively stable equilibria and (ii) Nash equilibria of the game.
We show that under mild assumptions, the performatively stable equilibria can be found efficiently by a variety of algorithms.
arXiv Detail & Related papers (2022-01-10T15:31:10Z) - Uncoupled Bandit Learning towards Rationalizability: Benchmarks,
Barriers, and Algorithms [41.307340085194625]
We study the last-iterate convergence guarantee in general games toward rationalizability.
This learning task naturally generalizes best arm identification problems.
We develop a new algorithm that adjusts Exp3 with Diminishing Historical rewards.
arXiv Detail & Related papers (2021-11-10T02:10:07Z) - Bounded rationality for relaxing best response and mutual consistency:
An information-theoretic model of partial self-reference [0.0]
This work focuses on some of the assumptions underlying rationality such as mutual consistency and best-response.
We consider ways to relax these assumptions using concepts from level-$k$ reasoning and quantal response equilibrium (QRE) respectively.
arXiv Detail & Related papers (2021-06-30T06:56:56Z) - Evolutionary Strategies with Analogy Partitions in p-guessing Games [0.0]
We introduce an evolutionary process of learning to investigate the dynamics of learning in unstable p-guessing games environments.
We show that our genetic algorithm behaves consistently with previous results in persistent environments, converging to the Nash equilibrium.
arXiv Detail & Related papers (2021-03-26T10:28:23Z) - Model-Free Online Learning in Unknown Sequential Decision Making
Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium.
We formalize an online learning setting in which the strategy space is not known to the agent.
We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.