Related papers: Conservative Equilibrium Discovery in Offline Game-Theoretic Multiagent Reinforcement Learning

Conservative Equilibrium Discovery in Offline Game-Theoretic Multiagent Reinforcement Learning

URL: http://arxiv.org/abs/2603.00374v1
Date: Fri, 27 Feb 2026 23:24:02 GMT
Title: Conservative Equilibrium Discovery in Offline Game-Theoretic Multiagent Reinforcement Learning
Authors: Austin A. Nguyen, Michael P. Wellman,
Abstract summary: We consider the problem in a mixed-motive multiagent setting, where the goal is to solve a game under the offline learning constraint.<n>We extend Policy Space Response Oracles (PSRO), an online game-solving approach, by quantifying game dynamics uncertainty.<n>We propose a novel meta-strategy solver, tailored for the offline setting, to guide strategy exploration in PSRO.
Score: 6.299504742623642
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Offline learning of strategies takes data efficiency to its extreme by restricting algorithms to a fixed dataset of state-action trajectories. We consider the problem in a mixed-motive multiagent setting, where the goal is to solve a game under the offline learning constraint. We first frame this problem in terms of selecting among candidate equilibria. Since datasets may inform only a small fraction of game dynamics, it is generally infeasible in offline game-solving to even verify a proposed solution is a true equilibrium. Therefore, we consider the relative probability of low regret (i.e., closeness to equilibrium) across candidates based on the information available. Specifically, we extend Policy Space Response Oracles (PSRO), an online game-solving approach, by quantifying game dynamics uncertainty and modifying the RL objective to skew towards solutions more likely to have low regret in the true game. We further propose a novel meta-strategy solver, tailored for the offline setting, to guide strategy exploration in PSRO. Our incorporation of Conservatism principles from Offline reinforcement learning approaches for strategy Exploration gives our approach its name: COffeE-PSRO. Experiments demonstrate COffeE-PSRO's ability to extract lower-regret solutions than state-of-the-art offline approaches and reveal relationships between algorithmic components empirical game fidelity, and overall performance.

Related papers

Meta-Learning in Self-Play Regret Minimization [10.843705580746397]
We present a general approach to online optimization which plays a crucial role in many algorithms for approximating Nash equilibria in two-player zero-sum games.<n>We build upon this, extending the framework to the more challenging self-play setting, which is the basis for most state-of-the-art equilibrium approximation algorithms.<n>Our meta-learned algorithms considerably outperform other state-of-the-art regret minimization algorithms.
arXiv Detail & Related papers (2025-04-26T13:27:24Z)
Hokoff: Real Game Dataset from Honor of Kings and its Offline Reinforcement Learning Benchmarks [59.50879251101105]
We propose Hokoff, a comprehensive set of pre-collected datasets that covers offline RL and offline MARL. This data is derived from Honor of Kings, a recognized Multiplayer Online Battle Arena (MOBA) game. We also introduce a novel baseline algorithm tailored for the inherent hierarchical action space of the game.
arXiv Detail & Related papers (2024-08-20T05:38:50Z)
Bayesian Design Principles for Offline-to-Online Reinforcement Learning [50.97583504192167]
offline-to-online fine-tuning is crucial for real-world applications where exploration can be costly or unsafe. In this paper, we tackle the dilemma of offline-to-online fine-tuning: if the agent remains pessimistic, it may fail to learn a better policy, while if it becomes optimistic directly, performance may suffer from a sudden drop. We show that Bayesian design principles are crucial in solving such a dilemma.
arXiv Detail & Related papers (2024-05-31T16:31:07Z)
Paths to Equilibrium in Games [6.812247730094933]
We study sequences of strategies satisfying a pairwise constraint inspired by policy updating in reinforcement learning. Our analysis reveals a counterintuitive insight that reward deteriorating strategic updates are key to driving play to equilibrium along a satisficing path.
arXiv Detail & Related papers (2024-03-26T19:58:39Z)
Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks [94.07688076435818]
We study reinforcement learning for learning a Quantal Stackelberg Equilibrium (QSE) in an episodic Markov game with a leader-follower structure. Our algorithms are based on (i) learning the quantal response model via maximum likelihood estimation and (ii) model-free or model-based RL for solving the leader's decision making problem.
arXiv Detail & Related papers (2023-07-26T10:24:17Z)
Data-Scarce Identification of Game Dynamics via Sum-of-Squares Optimization [29.568222003322344]
We introduce the Side-Information Assisted Regression (SIAR) framework, designed to identify game dynamics in multiplayer normal-form games. SIAR is solved using sum-of-squares (SOS) optimization, resulting in a hierarchy of approximations that provably converge to the true dynamics of the system. We showcase that the SIAR framework accurately predicts player behavior across a spectrum of normal-form games, widely-known families of game dynamics, and strong benchmarks, even if the unknown system is chaotic.
arXiv Detail & Related papers (2023-07-13T09:14:48Z)
Offline Learning in Markov Games with General Function Approximation [22.2472618685325]
We study offline multi-agent reinforcement learning (RL) in Markov games. We provide the first framework for sample-efficient offline learning in Markov games.
arXiv Detail & Related papers (2023-02-06T05:22:27Z)
Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. We model players' strategies using artificial neural networks. This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z)
Offline Stochastic Shortest Path: Learning, Evaluation and Towards Optimality [57.91411772725183]
In this paper, we consider the offline shortest path problem when the state space and the action space are finite. We design the simple value-based algorithms for tackling both offline policy evaluation (OPE) and offline policy learning tasks. Our analysis of these simple algorithms yields strong instance-dependent bounds which can imply worst-case bounds that are near-minimax optimal.
arXiv Detail & Related papers (2022-06-10T07:44:56Z)
Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games [95.10091348976779]
We study decentralized policy learning in Markov games where we control a single agent to play with nonstationary and possibly adversarial opponents. We propose a new algorithm, underlineDecentralized underlineOptimistic hypeunderlineRpolicy munderlineIrror deunderlineScent (DORIS) DORIS achieves $sqrtK$-regret in the context of general function approximation, where $K$ is the number of episodes.
arXiv Detail & Related papers (2022-06-03T14:18:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.