Related papers: The Hidden Game Problem

The Hidden Game Problem

URL: http://arxiv.org/abs/2510.03845v1
Date: Sat, 04 Oct 2025 15:46:04 GMT
Title: The Hidden Game Problem
Authors: Gon Buzaglo, Noah Golowich, Elad Hazan,
Abstract summary: We introduce the hidden game problem, where for each player, an unknown subset of strategies consistently yields higher rewards compared to the rest.<n>We develop a composition of regret minimization techniques that achieve optimal external and swap regret bounds.<n>Our approach ensures rapid convergence to correlated equilibria in hidden subgames, leveraging the hidden game structure for improved computational efficiency.
Score: 24.447454826751155
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper investigates a class of games with large strategy spaces, motivated by challenges in AI alignment and language games. We introduce the hidden game problem, where for each player, an unknown subset of strategies consistently yields higher rewards compared to the rest. The central question is whether efficient regret minimization algorithms can be designed to discover and exploit such hidden structures, leading to equilibrium in these subgames while maintaining rationality in general. We answer this question affirmatively by developing a composition of regret minimization techniques that achieve optimal external and swap regret bounds. Our approach ensures rapid convergence to correlated equilibria in hidden subgames, leveraging the hidden game structure for improved computational efficiency.

Related papers

Meta-Learning in Self-Play Regret Minimization [10.843705580746397]
We present a general approach to online optimization which plays a crucial role in many algorithms for approximating Nash equilibria in two-player zero-sum games.<n>We build upon this, extending the framework to the more challenging self-play setting, which is the basis for most state-of-the-art equilibrium approximation algorithms.<n>Our meta-learned algorithms considerably outperform other state-of-the-art regret minimization algorithms.
arXiv Detail & Related papers (2025-04-26T13:27:24Z)
Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game. Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z)
Playing Large Games with Oracles and AI Debate [27.355621483737913]
Existing algorithms for online game playing require per-iteration in the number of actions, which can be prohibitive for large games. We give a novel efficient algorithm for simultaneous external and internal regret minimization whose regret depends logarithmically on the number of actions.
arXiv Detail & Related papers (2023-12-08T02:06:55Z)
On the Convergence of No-Regret Learning Dynamics in Time-Varying Games [89.96815099996132]
We characterize the convergence of optimistic gradient descent (OGD) in time-varying games. Our framework yields sharp convergence bounds for the equilibrium gap of OGD in zero-sum games. We also provide new insights on dynamic regret guarantees in static games.
arXiv Detail & Related papers (2023-01-26T17:25:45Z)
Regret Minimization and Convergence to Equilibria in General-sum Markov Games [61.6869963435955]
We present the first algorithm for learning in general-sum Markov games that provides sublinear regret guarantees when executed by all agents.<n>Our algorithm is decentralized, computationally efficient, and does not require any communication between agents.
arXiv Detail & Related papers (2022-07-28T16:27:59Z)
Impartial Games: A Challenge for Reinforcement Learning [0.0]
We showcase that AlphaZero-style reinforcement learning algorithms encounter significant and fundamental challenges when applied to impartial games.<n>Our findings reveal that while AlphaZero-style agents can achieve champion-level play, their learning progression severely degrades as board size increases.<n>These results align with broader concerns regarding AlphaZero-style algorithms' vulnerability to adversarial attacks.
arXiv Detail & Related papers (2022-05-25T14:02:02Z)
No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning. We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures. Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z)
Generating Diverse and Competitive Play-Styles for Strategy Games [58.896302717975445]
We propose Portfolio Monte Carlo Tree Search with Progressive Unpruning for playing a turn-based strategy game (Tribes) We show how it can be parameterized so a quality-diversity algorithm (MAP-Elites) is used to achieve different play-styles while keeping a competitive level of play. Our results show that this algorithm is capable of achieving these goals even for an extensive collection of game levels beyond those used for training.
arXiv Detail & Related papers (2021-04-17T20:33:24Z)
Hindsight and Sequential Rationality of Correlated Play [18.176128899338433]
We look at algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. We develop and advocate for this hindsight framing of learning in general sequential decision-making settings. We present examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature.
arXiv Detail & Related papers (2020-12-10T18:30:21Z)
Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.