Simulation-Free PSRO: Removing Game Simulation from Policy Space Response Oracles
- URL: http://arxiv.org/abs/2601.05279v1
- Date: Tue, 30 Dec 2025 14:02:32 GMT
- Title: Simulation-Free PSRO: Removing Game Simulation from Policy Space Response Oracles
- Authors: Yingzhuo Liu, Shuodi Liu, Weijun Luo, Liuyu Xiang, Zhaofeng He,
- Abstract summary: Policy Space Response Oracles (PSRO) combines game-theoretic equilibrium computation with learning and is effective in approximating Nash Equilibrium in zero-sum games.<n>Our analysis shows that game simulation is the primary bottleneck in PSRO's runtime.<n>We propose a novel Dynamic Window-based Simulation-Free PSRO, which introduces the concept of a strategy window to replace the original strategy set maintained in PSRO.
- Score: 12.95757021157425
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Policy Space Response Oracles (PSRO) combines game-theoretic equilibrium computation with learning and is effective in approximating Nash Equilibrium in zero-sum games. However, the computational cost of PSRO has become a significant limitation to its practical application. Our analysis shows that game simulation is the primary bottleneck in PSRO's runtime. To address this issue, we conclude the concept of Simulation-Free PSRO and summarize existing methods that instantiate this concept. Additionally, we propose a novel Dynamic Window-based Simulation-Free PSRO, which introduces the concept of a strategy window to replace the original strategy set maintained in PSRO. The number of strategies in the strategy window is limited, thereby simplifying opponent strategy selection and improving the robustness of the best response. Moreover, we use Nash Clustering to select the strategy to be eliminated, ensuring that the number of strategies within the strategy window is effectively limited. Our experiments across various environments demonstrate that the Dynamic Window mechanism significantly reduces exploitability compared to existing methods, while also exhibiting excellent compatibility. Our code is available at https://github.com/enochliu98/SF-PSRO.
Related papers
- Conservative Equilibrium Discovery in Offline Game-Theoretic Multiagent Reinforcement Learning [6.299504742623642]
We consider the problem in a mixed-motive multiagent setting, where the goal is to solve a game under the offline learning constraint.<n>We extend Policy Space Response Oracles (PSRO), an online game-solving approach, by quantifying game dynamics uncertainty.<n>We propose a novel meta-strategy solver, tailored for the offline setting, to guide strategy exploration in PSRO.
arXiv Detail & Related papers (2026-02-27T23:24:02Z) - Strategy Executability in Mathematical Reasoning: Leveraging Human-Model Differences for Effective Guidance [86.46794021499511]
We show a previously underexplored gap between strategy usage and strategy executability.<n>We propose Selective Strategy Retrieval (SSR), a test-time framework that explicitly models executability.<n> SSR yields reliable and consistent improvements over direct solving, in-context learning, and single-source guidance.
arXiv Detail & Related papers (2026-02-26T03:34:23Z) - Policy-Conditioned Policies for Multi-Agent Task Solving [53.67744322553693]
In this work, we propose a paradigm shift that bridges the gap by representing policies as human-interpretable source code.<n>We reformulate the learning problem by utilizing Large Language Models (LLMs) as approximate interpreters.<n>We formalize this process as textitProgrammatic Iterated Best Response (PIBR), an algorithm where the policy code is optimized by textual gradients.
arXiv Detail & Related papers (2025-12-24T07:42:10Z) - Stochastic activations [53.40901433014535]
This strategy randomly selects between several non-linear functions in the feed-forward layer of a large language model.<n>We leverage this strategy in two ways: (1) We use activations during pre-training and fine-tune the model with RELU, which is used at inference time to provide sparse latent vectors.<n>This strategy performs reasonably well: it is only slightly inferior to the best deterministic non-linearity, namely SILU combined with temperature scaling.
arXiv Detail & Related papers (2025-09-26T13:53:56Z) - Policy Abstraction and Nash Refinement in Tree-Exploiting PSRO [10.137357924571262]
Policy Space Response Oracles (PSRO) interleaves empirical game-theoretic analysis with deep reinforcement learning (DRL) to solve games too complex for traditional analytic methods.<n>Tree-exploiting PSRO (TE-PSRO) is a variant of this approach that iteratively builds a coarsened empirical game model in extensive form.<n>We make two main methodological advances to TE-PSRO that enhance its applicability to complex games of imperfect information.
arXiv Detail & Related papers (2025-02-05T05:48:16Z) - Policy Space Response Oracles: A Survey [16.421805293725818]
This survey provides a comprehensive overview of a framework for large games, known as Policy Space Response Oracles (PSRO)
PSRO holds promise to improve scalability by focusing attention on sufficient subsets of strategies.
We focus on the strategy exploration problem for PSRO: the challenge of assembling effective subsets of strategies that still represent the original game well with minimum computational cost.
arXiv Detail & Related papers (2024-03-04T17:15:09Z) - Scalable Learning of Intrusion Responses through Recursive Decomposition [0.0]
We study automated intrusion response for an IT infrastructure and the interaction between an attacker and a defender as a partially observed game.
To solve the game we follow an approach where attack and defense strategies co-evolve through reinforcement learning and self-play toward an equilibrium.
We introduce an algorithm called Decompositional Fictitious Self-Play (DFSP), which learns equilibria through approximation.
arXiv Detail & Related papers (2023-09-06T18:12:07Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Efficient Policy Space Response Oracles [61.71849698253696]
Policy Space Response Oracle method (PSRO) provides a general solution to Nash equilibrium in two-player zero-sum games.
Central to our development is the newly-introduced of minimax optimization on unrestricted-restricted (URR) games.
We report a 50x speedup in wall-time, 10x data efficiency, and similar exploitability as existing PSRO methods on Kuhn and Leduc Poker games.
arXiv Detail & Related papers (2022-01-28T17:54:45Z) - Non-stationary Online Learning with Memory and Non-stochastic Control [71.14503310914799]
We study the problem of Online Convex Optimization (OCO) with memory, which allows loss functions to depend on past decisions.
In this paper, we introduce dynamic policy regret as the performance measure to design algorithms robust to non-stationary environments.
We propose a novel algorithm for OCO with memory that provably enjoys an optimal dynamic policy regret in terms of time horizon, non-stationarity measure, and memory length.
arXiv Detail & Related papers (2021-02-07T09:45:15Z) - Mixed Strategies for Robust Optimization of Unknown Objectives [93.8672371143881]
We consider robust optimization problems, where the goal is to optimize an unknown objective function against the worst-case realization of an uncertain parameter.
We design a novel sample-efficient algorithm GP-MRO, which sequentially learns about the unknown objective from noisy point evaluations.
GP-MRO seeks to discover a robust and randomized mixed strategy, that maximizes the worst-case expected objective value.
arXiv Detail & Related papers (2020-02-28T09:28:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.