Policy Abstraction and Nash Refinement in Tree-Exploiting PSRO
- URL: http://arxiv.org/abs/2502.02901v2
- Date: Sat, 15 Feb 2025 06:14:03 GMT
- Title: Policy Abstraction and Nash Refinement in Tree-Exploiting PSRO
- Authors: Christine Konicki, Mithun Chakraborty, Michael P. Wellman,
- Abstract summary: Policy Space Response Oracles (PSRO) interleaves empirical game-theoretic analysis with deep reinforcement learning (DRL) to solve games too complex for traditional analytic methods.
Tree-exploiting PSRO (TE-PSRO) is a variant of this approach that iteratively builds a coarsened empirical game model in extensive form.
We make two main methodological advances to TE-PSRO that enhance its applicability to complex games of imperfect information.
- Score: 10.137357924571262
- License:
- Abstract: Policy Space Response Oracles (PSRO) interleaves empirical game-theoretic analysis with deep reinforcement learning (DRL) to solve games too complex for traditional analytic methods. Tree-exploiting PSRO (TE-PSRO) is a variant of this approach that iteratively builds a coarsened empirical game model in extensive form using data obtained from querying a simulator that represents a detailed description of the game. We make two main methodological advances to TE-PSRO that enhance its applicability to complex games of imperfect information. First, we introduce a scalable representation for the empirical game tree where edges correspond to implicit policies learned through DRL. These policies cover conditions in the underlying game abstracted in the game model, supporting sustainable growth of the tree over epochs. Second, we leverage extensive form in the empirical model by employing refined Nash equilibria to direct strategy exploration. To enable this, we give a modular and scalable algorithm based on generalized backward induction for computing a subgame perfect equilibrium (SPE) in an imperfect-information game. We experimentally evaluate our approach on a suite of games including an alternating-offer bargaining game with outside offers; our results demonstrate that TE-PSRO converges toward equilibrium faster when new strategies are generated based on SPE rather than Nash equilibrium, and with reasonable time/memory requirements for the growing empirical model.
Related papers
- Approximating Human Strategic Reasoning with LLM-Enhanced Recursive Reasoners Leveraging Multi-agent Hypergames [3.5083201638203154]
We implement a role-based multi-agent strategic interaction framework tailored to sophisticated reasoners.
We use one-shot, 2-player beauty contests to evaluate the reasoning capabilities of the latest LLMs.
Our experiments show that artificial reasoners can outperform the baseline model in terms of both approximating human behaviour and reaching the optimal solution.
arXiv Detail & Related papers (2025-02-11T10:37:20Z) - Model-Based RL for Mean-Field Games is not Statistically Harder than Single-Agent RL [57.745700271150454]
We study the sample complexity of reinforcement learning in Mean-Field Games (MFGs) with model-based function approximation.
We introduce the Partial Model-Based Eluder Dimension (P-MBED), a more effective notion to characterize the model class complexity.
arXiv Detail & Related papers (2024-02-08T14:54:47Z) - Scalable Learning of Intrusion Responses through Recursive Decomposition [0.0]
We study automated intrusion response for an IT infrastructure and the interaction between an attacker and a defender as a partially observed game.
To solve the game we follow an approach where attack and defense strategies co-evolve through reinforcement learning and self-play toward an equilibrium.
We introduce an algorithm called Decompositional Fictitious Self-Play (DFSP), which learns equilibria through approximation.
arXiv Detail & Related papers (2023-09-06T18:12:07Z) - SPRING: Studying the Paper and Reasoning to Play Games [102.5587155284795]
We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM)
In experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment.
Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories.
arXiv Detail & Related papers (2023-05-24T18:14:35Z) - Co-Learning Empirical Games and World Models [23.800790782022222]
Empirical games drive world models toward a broader consideration of possible game dynamics.
World models guide empirical games to efficiently discover new strategies through planning.
A new algorithm, Dyna-PSRO, co-learns an empirical game and a world model.
arXiv Detail & Related papers (2023-05-23T16:37:21Z) - Finding mixed-strategy equilibria of continuous-action games without
gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients.
We model players' strategies using artificial neural networks.
This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z) - Efficient Policy Space Response Oracles [61.71849698253696]
Policy Space Response Oracle method (PSRO) provides a general solution to Nash equilibrium in two-player zero-sum games.
Central to our development is the newly-introduced of minimax optimization on unrestricted-restricted (URR) games.
We report a 50x speedup in wall-time, 10x data efficiency, and similar exploitability as existing PSRO methods on Kuhn and Leduc Poker games.
arXiv Detail & Related papers (2022-01-28T17:54:45Z) - Dyna-T: Dyna-Q and Upper Confidence Bounds Applied to Trees [0.9137554315375919]
We present a preliminary investigation of a novel algorithm called Dyna-T.
In reinforcement learning (RL) a planning agent has its own representation of the environment as a model.
Experience can be used for learning a better model or improve directly the value function and policy.
arXiv Detail & Related papers (2022-01-12T15:06:30Z) - Collective eXplainable AI: Explaining Cooperative Strategies and Agent
Contribution in Multiagent Reinforcement Learning with Shapley Values [68.8204255655161]
This study proposes a novel approach to explain cooperative strategies in multiagent RL using Shapley values.
Results could have implications for non-discriminatory decision making, ethical and responsible AI-derived decisions or policy making under fairness constraints.
arXiv Detail & Related papers (2021-10-04T10:28:57Z) - Deep Policy Networks for NPC Behaviors that Adapt to Changing Design
Parameters in Roguelike Games [137.86426963572214]
Turn-based strategy games like Roguelikes, for example, present unique challenges to Deep Reinforcement Learning (DRL)
We propose two network architectures to better handle complex categorical state spaces and to mitigate the need for retraining forced by design decisions.
arXiv Detail & Related papers (2020-12-07T08:47:25Z) - Pipeline PSRO: A Scalable Approach for Finding Approximate Nash
Equilibria in Large Games [11.866835246140647]
Policy Space Response Oracles (PSRO) is a deep reinforcement learning algorithm guaranteed to converge to an approximate Nash equilibrium.
We introduce Pipeline PSRO, the first scalable general method for finding approximate Nash equilibria in large games.
We also introduce an open-source environment for Barrage Stratego, a variant of Stratego with an approximate game complexity tree of $1050$.
arXiv Detail & Related papers (2020-06-15T17:17:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.