Efficient Policy Space Response Oracles
- URL: http://arxiv.org/abs/2202.00633v1
- Date: Fri, 28 Jan 2022 17:54:45 GMT
- Title: Efficient Policy Space Response Oracles
- Authors: Ming Zhou, Jingxiao Chen, Ying Wen, Weinan Zhang, Yaodong Yang, Yong
Yu
- Abstract summary: Policy Space Response Oracle method (PSRO) provides a general solution to Nash equilibrium in two-player zero-sum games.
Central to our development is the newly-introduced of minimax optimization on unrestricted-restricted (URR) games.
We report a 50x speedup in wall-time, 10x data efficiency, and similar exploitability as existing PSRO methods on Kuhn and Leduc Poker games.
- Score: 61.71849698253696
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Policy Space Response Oracle method (PSRO) provides a general solution to
Nash equilibrium in two-player zero-sum games but suffers from two problems:
(1) the computation inefficiency due to consistently evaluating current
populations by simulations; and (2) the exploration inefficiency due to
learning best responses against a fixed meta-strategy at each iteration. In
this work, we propose Efficient PSRO (EPSRO) that largely improves the
efficiency of the above two steps. Central to our development is the
newly-introduced subroutine of minimax optimization on unrestricted-restricted
(URR) games. By solving URR at each step, one can evaluate the current game and
compute the best response in one forward pass with no need for game
simulations. Theoretically, we prove that the solution procedures of EPSRO
offer a monotonic improvement on exploitability. Moreover, a desirable property
of EPSRO is that it is parallelizable, this allows for efficient exploration in
the policy space that induces behavioral diversity. We test EPSRO on three
classes of games and report a 50x speedup in wall-time, 10x data efficiency,
and similar exploitability as existing PSRO methods on Kuhn and Leduc Poker
games.
Related papers
- Self-adaptive PSRO: Towards an Automatic Population-based Game Solver [34.326819257554874]
Policy-Space Response Oracles (PSRO) as a general algorithmic framework has achieved state-of-the-art performance in learning equilibrium policies of two-player zero-sum games.
We make the first attempt to investigate the possibility of self-adaptively determining the optimal hyper parameter values in the PSRO framework.
Experiments on various two-player zero-sum games demonstrate the superiority of SPSRO over different baselines.
arXiv Detail & Related papers (2024-04-17T07:40:57Z) - ApproxED: Approximate exploitability descent via learned best responses [61.17702187957206]
We study the problem of finding an approximate Nash equilibrium of games with continuous action sets.
We propose two new methods that minimize an approximation of exploitability with respect to the strategy profile.
arXiv Detail & Related papers (2023-01-20T23:55:30Z) - Provably Efficient Fictitious Play Policy Optimization for Zero-Sum
Markov Games with Structured Transitions [145.54544979467872]
We propose and analyze new fictitious play policy optimization algorithms for zero-sum Markov games with structured but unknown transitions.
We prove tight $widetildemathcalO(sqrtK)$ regret bounds after $K$ episodes in a two-agent competitive game scenario.
Our algorithms feature a combination of Upper Confidence Bound (UCB)-type optimism and fictitious play under the scope of simultaneous policy optimization.
arXiv Detail & Related papers (2022-07-25T18:29:16Z) - Self-Play PSRO: Toward Optimal Populations in Two-Player Zero-Sum Games [69.5064797859053]
We introduce emphSelf-Play PSRO (SP-PSRO), a method that adds an approximately optimal policy to the population in each iteration.
SP-PSRO empirically tends to converge much faster than APSRO and in many games converge in just a few iterations.
arXiv Detail & Related papers (2022-07-13T22:55:51Z) - Anytime Optimal PSRO for Two-Player Zero-Sum Games [17.821479538423155]
Policy Space Response Oracles (PSRO) is a reinforcement learning algorithm for games that can handle continuous actions.
AODO is a double oracle algorithm for 2-player zero-sum games that converges to a Nash equilibrium.
We show that our methods achieve far lower exploitability than DO and PSRO and never increase exploitability.
arXiv Detail & Related papers (2022-01-19T16:34:11Z) - DORO: Distributional and Outlier Robust Optimization [98.44757325531631]
We propose the framework of DORO, for Distributional and Outlier Robust Optimization.
At the core of this approach is a refined risk function which prevents DRO from overfitting to potential outliers.
We theoretically prove the effectiveness of the proposed method, and empirically show that DORO improves the performance and stability of DRO with experiments on large modern datasets.
arXiv Detail & Related papers (2021-06-11T02:59:54Z) - POMO: Policy Optimization with Multiple Optima for Reinforcement
Learning [8.819672165548477]
We introduce Policy Optimization with Multiple Optima (POMO), an end-to-end approach for building such a solver.
POMO is applicable to a wide range of CO problems. It is designed to exploit the symmetries in the representation of a CO solution.
We demonstrate the effectiveness of POMO by solving three popular NP-hard problems, namely, traveling salesman (TSP), capacitated vehicle routing (CVRP), and 0-1 knapsack (KP)
arXiv Detail & Related papers (2020-10-30T00:57:50Z) - An Online Method for A Class of Distributionally Robust Optimization
with Non-Convex Objectives [54.29001037565384]
We propose a practical online method for solving a class of online distributionally robust optimization (DRO) problems.
Our studies demonstrate important applications in machine learning for improving the robustness of networks.
arXiv Detail & Related papers (2020-06-17T20:19:25Z) - Pipeline PSRO: A Scalable Approach for Finding Approximate Nash
Equilibria in Large Games [11.866835246140647]
Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm guaranteed to converge to an approximate Nash equilibrium.
We introduce Pipeline PSRO, the first scalable general method for finding approximate Nash equilibria in large games.
We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game complexity tree of $1050$.
arXiv Detail & Related papers (2020-06-15T17:17:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.