Related papers: Policy Space Response Oracles: A Survey

Related papers

Expanding LLM Agent Boundaries with Strategy-Guided Exploration [51.98616048282804]
Reinforcement learning (RL) has demonstrated notable success in post-training large language models (LLMs) as agents for tasks such as computer use, tool calling, and coding.<n>We propose Strategy-Guided Exploration (SGE) to shift exploration from low-level actions to higher-level language strategies.
arXiv Detail & Related papers (2026-03-02T16:28:39Z)
Conservative Equilibrium Discovery in Offline Game-Theoretic Multiagent Reinforcement Learning [6.299504742623642]
We consider the problem in a mixed-motive multiagent setting, where the goal is to solve a game under the offline learning constraint.<n>We extend Policy Space Response Oracles (PSRO), an online game-solving approach, by quantifying game dynamics uncertainty.<n>We propose a novel meta-strategy solver, tailored for the offline setting, to guide strategy exploration in PSRO.
arXiv Detail & Related papers (2026-02-27T23:24:02Z)
Simulation-Free PSRO: Removing Game Simulation from Policy Space Response Oracles [12.95757021157425]
Policy Space Response Oracles (PSRO) combines game-theoretic equilibrium computation with learning and is effective in approximating Nash Equilibrium in zero-sum games.<n>Our analysis shows that game simulation is the primary bottleneck in PSRO's runtime.<n>We propose a novel Dynamic Window-based Simulation-Free PSRO, which introduces the concept of a strategy window to replace the original strategy set maintained in PSRO.
arXiv Detail & Related papers (2025-12-30T14:02:32Z)
Towards Agentic RAG with Deep Reasoning: A Survey of RAG-Reasoning Systems in LLMs [69.10441885629787]
Retrieval-Augmented Generation (RAG) lifts the factuality of Large Language Models (LLMs) by injecting external knowledge.<n>It falls short on problems that demand multi-step inference; conversely, purely reasoning-oriented approaches often hallucinate or mis-ground facts.<n>This survey synthesizes both strands under a unified reasoning-retrieval perspective.
arXiv Detail & Related papers (2025-07-13T03:29:41Z)
WGSR-Bench: Wargame-based Game-theoretic Strategic Reasoning Benchmark for Large Language Models [28.28739884703072]
This paper introduces WGSR-Bench, the first strategy reasoning benchmark for Large Language Models (LLMs) using wargame as its evaluation environment.<n>We design test samples around three core tasks, i.e., Environmental situation awareness, Opponent risk modeling and Policy generation, to systematically assess main abilities of strategic reasoning.
arXiv Detail & Related papers (2025-06-12T01:16:34Z)
Strategy-Augmented Planning for Large Language Models via Opponent Exploitation [11.840105106884543]
We introduce a two-stage Strategy-Augmented Planning (SAP) framework that significantly enhances the opponent exploitation capabilities of LLM-based agents.<n>In the offline stage, we construct an explicit strategy space and subsequently collect strategy-outcome pair data for training the Strategy Evaluation Network (SEN)<n>During the online phase, SAP dynamically recognizes the opponent's strategies and greedily exploits them by searching best response strategy on the well-trained SEN.
arXiv Detail & Related papers (2025-05-13T11:41:10Z)
FAIRGAME: a Framework for AI Agents Bias Recognition using Game Theory [51.96049148869987]
We present FAIRGAME, a Framework for AI Agents Bias Recognition using Game Theory. We describe its implementation and usage, and we employ it to uncover biased outcomes in popular games among AI agents. Overall, FAIRGAME allows users to reliably and easily simulate their desired games and scenarios.
arXiv Detail & Related papers (2025-04-19T15:29:04Z)
EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning [69.55982246413046]
We propose explicit policy optimization (EPO) for strategic reasoning. EPO provides strategies in open-ended action space and can be plugged into arbitrary LLM agents to motivate goal-directed behavior. Experiments across social and physical domains demonstrate EPO's ability of long-term goal alignment.
arXiv Detail & Related papers (2025-02-18T03:15:55Z)
Policy Abstraction and Nash Refinement in Tree-Exploiting PSRO [10.137357924571262]
Policy Space Response Oracles (PSRO) interleaves empirical game-theoretic analysis with deep reinforcement learning (DRL) to solve games too complex for traditional analytic methods. Tree-exploiting PSRO (TE-PSRO) is a variant of this approach that iteratively builds a coarsened empirical game model in extensive form. We make two main methodological advances to TE-PSRO that enhance its applicability to complex games of imperfect information.
arXiv Detail & Related papers (2025-02-05T05:48:16Z)
AirRAG: Activating Intrinsic Reasoning for Retrieval Augmented Generation using Tree-based Search [4.4907551923591695]
We propose a novel thinking pattern in RAG that integrates system analysis with efficient reasoning actions. Specifically, our approach designs five fundamental reasoning actions, which are expanded to a broad tree-based reasoning space. Experimental results demonstrate the effectiveness of AirRAG, showing significant performance gains on complex question-answering datasets.
arXiv Detail & Related papers (2025-01-17T09:16:13Z)
LLM as a Mastermind: A Survey of Strategic Reasoning with Large Language Models [75.89014602596673]
Strategic reasoning requires understanding and predicting adversary actions in multi-agent settings while adjusting strategies accordingly. We explore the scopes, applications, methodologies, and evaluation metrics related to strategic reasoning with Large Language Models. It underscores the importance of strategic reasoning as a critical cognitive capability and offers insights into future research directions and potential improvements.
arXiv Detail & Related papers (2024-04-01T16:50:54Z)
K-Level Reasoning: Establishing Higher Order Beliefs in Large Language Models for Strategic Reasoning [76.3114831562989]
It requires Large Language Model (LLM) agents to adapt their strategies dynamically in multi-agent environments. We propose a novel framework: "K-Level Reasoning with Large Language Models (K-R)"
arXiv Detail & Related papers (2024-02-02T16:07:05Z)
ALYMPICS: LLM Agents Meet Game Theory -- Exploring Strategic Decision-Making with AI Agents [77.34720446306419]
Alympics is a systematic simulation framework utilizing Large Language Model (LLM) agents for game theory research. Alympics creates a versatile platform for studying complex game theory problems.
arXiv Detail & Related papers (2023-11-06T16:03:46Z)
Co-Learning Empirical Games and World Models [23.800790782022222]
Empirical games drive world models toward a broader consideration of possible game dynamics. World models guide empirical games to efficiently discover new strategies through planning. A new algorithm, Dyna-PSRO, co-learns an empirical game and a world model.
arXiv Detail & Related papers (2023-05-23T16:37:21Z)
Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments [55.41685740015095]
We study offline reinforcement learning under a novel model called strategic MDP. We propose a novel algorithm, Pessimistic policy Learning with Algorithmic iNstruments (PLAN)
arXiv Detail & Related papers (2022-08-23T15:32:44Z)
Efficient Policy Space Response Oracles [61.71849698253696]
Policy Space Response Oracle method (PSRO) provides a general solution to Nash equilibrium in two-player zero-sum games. Central to our development is the newly-introduced of minimax optimization on unrestricted-restricted (URR) games. We report a 50x speedup in wall-time, 10x data efficiency, and similar exploitability as existing PSRO methods on Kuhn and Leduc Poker games.
arXiv Detail & Related papers (2022-01-28T17:54:45Z)
RESPER: Computationally Modelling Resisting Strategies in Persuasive Conversations [0.7505101297221454]
We propose a generalised framework for identifying resisting strategies in persuasive conversations. Our experiments reveal the asymmetry of power roles in non-collaborative goal-directed conversations. We also investigate the role of different resisting strategies on the conversation outcome.
arXiv Detail & Related papers (2021-01-26T03:44:17Z)
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies [78.68534915690404]
StrategyQA is a benchmark where the required reasoning steps are implicit in the question, and should be inferred using a strategy. We propose a data collection procedure that combines term-based priming to inspire annotators, careful control over the annotator population, and adversarial filtering for eliminating reasoning shortcuts. Overall, StrategyQA includes 2,780 examples, each consisting of a strategy question, its decomposition, and evidence paragraphs.
arXiv Detail & Related papers (2021-01-06T19:14:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.