Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in
Symmetric Zero-sum Games
- URL: http://arxiv.org/abs/2205.15879v1
- Date: Tue, 31 May 2022 15:27:38 GMT
- Title: Simplex Neural Population Learning: Any-Mixture Bayes-Optimality in
Symmetric Zero-sum Games
- Authors: Siqi Liu, Marc Lanctot, Luke Marris, Nicolas Heess
- Abstract summary: Learning to play optimally against any mixture over a diverse set of strategies is important practical interests in competitive games.
We propose simplex-NeuPL that satisfies two desiderata simultaneously.
We show that the resulting conditional policies incorporate prior information about their opponents effectively.
- Score: 36.19779736396775
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning to play optimally against any mixture over a diverse set of
strategies is of important practical interests in competitive games. In this
paper, we propose simplex-NeuPL that satisfies two desiderata simultaneously:
i) learning a population of strategically diverse basis policies, represented
by a single conditional network; ii) using the same network, learn
best-responses to any mixture over the simplex of basis policies. We show that
the resulting conditional policies incorporate prior information about their
opponents effectively, enabling near optimal returns against arbitrary mixture
policies in a game with tractable best-responses. We verify that such policies
behave Bayes-optimally under uncertainty and offer insights in using this
flexibility at test time. Finally, we offer evidence that learning
best-responses to any mixture policies is an effective auxiliary task for
strategic exploration, which, by itself, can lead to more performant
populations.
Related papers
- Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning [55.65738319966385]
We propose a novel online algorithm, iterative Nash policy optimization (INPO)
Unlike previous methods, INPO bypasses the need for estimating the expected win rate for individual responses.
With an LLaMA-3-8B-based SFT model, INPO achieves a 42.6% length-controlled win rate on AlpacaEval 2.0 and a 37.8% win rate on Arena-Hard.
arXiv Detail & Related papers (2024-06-30T08:00:34Z) - Fictitious Cross-Play: Learning Global Nash Equilibrium in Mixed
Cooperative-Competitive Games [14.979239870856535]
Self-play (SP) is a popular reinforcement learning framework for solving competitive games.
In this work, we develop a novel algorithm, Fictitious Cross-Play (FXP), which inherits the benefits from both frameworks.
arXiv Detail & Related papers (2023-10-05T07:19:33Z) - Learning Diverse Risk Preferences in Population-based Self-play [23.07952140353786]
Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies.
We introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty.
We show that our method achieves comparable or superior performance in competitive games.
arXiv Detail & Related papers (2023-05-19T06:56:02Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - CAMEO: Curiosity Augmented Metropolis for Exploratory Optimal Policies [62.39667564455059]
We consider and study a distribution of optimal policies.
In experimental simulations we show that CAMEO indeed obtains policies that all solve classic control problems.
We further show that the different policies we sample present different risk profiles, corresponding to interesting practical applications in interpretability.
arXiv Detail & Related papers (2022-05-19T09:48:56Z) - Mixed Strategies for Security Games with General Defending Requirements [37.02840909260615]
The Stackelberg security game is played between a defender and an attacker, where the defender needs to allocate a limited amount of resources to multiple targets.
We propose an efficient close-to-optimal Patching algorithm that computes mixed strategies that use only few pure strategies.
arXiv Detail & Related papers (2022-04-26T08:56:39Z) - Constructing a Good Behavior Basis for Transfer using Generalized Policy
Updates [63.58053355357644]
We study the problem of learning a good set of policies, so that when combined together, they can solve a wide variety of unseen reinforcement learning tasks.
We show theoretically that having access to a specific set of diverse policies, which we call a set of independent policies, can allow for instantaneously achieving high-level performance.
arXiv Detail & Related papers (2021-12-30T12:20:46Z) - On the Impossibility of Convergence of Mixed Strategies with No Regret
Learning [10.515544361834241]
We study convergence properties of the mixed strategies that result from a general class of optimal no regret learning strategies.
We consider the class of strategies whose information set at each step is the empirical average of the opponent's realized play.
arXiv Detail & Related papers (2020-12-03T18:02:40Z) - On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.