Related papers: Complexity and Algorithms for Exploiting Quantal Opponents in Large Two-Player Games

Complexity and Algorithms for Exploiting Quantal Opponents in Large Two-Player Games

URL: http://arxiv.org/abs/2009.14521v2
Date: Wed, 16 Dec 2020 12:11:43 GMT
Title: Complexity and Algorithms for Exploiting Quantal Opponents in Large Two-Player Games
Authors: David Milec, Jakub \v{C}ern\'y, Viliam Lis\'y, Bo An
Abstract summary: Solution concepts of traditional game theory assume entirely rational players; therefore, their ability to exploit subrational opponents is limited. This paper aims to analyze and propose scalable algorithms for computing effective and robust strategies against a quantal opponent in normal-form and extensive-form games.
Score: 16.43565579998679
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Solution concepts of traditional game theory assume entirely rational players; therefore, their ability to exploit subrational opponents is limited. One type of subrationality that describes human behavior well is the quantal response. While there exist algorithms for computing solutions against quantal opponents, they either do not scale or may provide strategies that are even worse than the entirely-rational Nash strategies. This paper aims to analyze and propose scalable algorithms for computing effective and robust strategies against a quantal opponent in normal-form and extensive-form games. Our contributions are: (1) we define two different solution concepts related to exploiting quantal opponents and analyze their properties; (2) we prove that computing these solutions is computationally hard; (3) therefore, we evaluate several heuristic approximations based on scalable counterfactual regret minimization (CFR); and (4) we identify a CFR variant that exploits the bounded opponents better than the previously used variants while being less exploitable by the worst-case perfectly-rational opponent.

Related papers

Dominated Actions in Imperfect-Information Games [0.4895118383237099]
We define and study the concept of dominated actions in imperfect-information games. Our main result is a empirically-time algorithm for determining whether an action is dominated by any mixed strategy. We explore the role of dominated actions in the "All In or Fold" No-Limit Texas Hold'em poker variant.
arXiv Detail & Related papers (2025-04-13T20:48:44Z)
Adapting Beyond the Depth Limit: Counter Strategies in Large Imperfect Information Games [4.56754610152086]
We study the problem of adapting to a known sub-rational opponent during online play while remaining robust to rational opponents. Existing methods assume rational play beyond the depth-limit, which only allows them to adapt a very limited portion of the opponent's behaviour. We propose an algorithm that uses a strategy-portfolio approach - which we refer to as matrix-valued states - for depth-limited search.
arXiv Detail & Related papers (2025-01-15T14:04:27Z)
Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models [70.07661254213181]
We propose two principled algorithms for the test-time compute of large language models. We prove theoretically that the failure probability of one algorithm decays to zero exponentially as its test-time compute grows.
arXiv Detail & Related papers (2024-11-29T05:29:47Z)
Anytime-Constrained Multi-Agent Reinforcement Learning [6.981971551979697]
We introduce anytime constraints to the multi-agent setting with the corresponding solution being anytime-constrained equilibrium (ACE) We present a comprehensive theory of anytime-constrained Markov games, which includes a computational characterization of feasible policies. We also first theory of efficient computation for action-constrained Markov games, which may be of independent interest.
arXiv Detail & Related papers (2024-10-31T05:07:01Z)
Imperfect-Recall Games: Equilibrium Concepts and Their Complexity [74.01381499760288]
We investigate optimal decision making under imperfect recall, that is, when an agent forgets information it once held before. In the framework of extensive-form games with imperfect recall, we analyze the computational complexities of finding equilibria in multiplayer settings.
arXiv Detail & Related papers (2024-06-23T00:27:28Z)
Numerical solution of a PDE arising from prediction with expert advice [2.048226951354646]
This work investigates the online machine learning problem of prediction with expert advice in an adversarial setting. The continuum limit of this game over a large number of steps is a degenerate elliptic equation whose solution encodes the optimal strategies for both players.
arXiv Detail & Related papers (2024-06-09T12:17:05Z)
Securing Equal Share: A Principled Approach for Learning Multiplayer Symmetric Games [21.168085154982712]
equilibria in multiplayer games are neither unique nor non-exploitable. This paper takes an initial step towards addressing these challenges by focusing on the natural objective of equal share. We design a series of efficient algorithms, inspired by no-regret learning, that provably attain approximate equal share across various settings.
arXiv Detail & Related papers (2024-06-06T15:59:17Z)
Variational Methods for Computing Non-Local Quantum Strategies [1.95414377613382]
In a nonlocal game, two noncommunicating players cooperate to convince a referee that they possess a strategy that does not violate the rules of the game. We show that our algorithm is capable of generating a short-depth circuit that implements an optimal quantum strategy for a graph coloring game.
arXiv Detail & Related papers (2023-11-02T16:17:18Z)
Hardness of Independent Learning and Sparse Equilibrium Computation in Markov Games [70.19141208203227]
We consider the problem of decentralized multi-agent reinforcement learning in Markov games. We show that no algorithm attains no-regret in general-sum games when executed independently by all players. We show that our lower bounds hold even for seemingly easier setting in which all agents are controlled by a centralized algorithm.
arXiv Detail & Related papers (2023-03-22T03:28:12Z)
Model-Free Online Learning in Unknown Sequential Decision Making Problems and Games [114.90723492840499]
In large two-player zero-sum imperfect-information games, modern extensions of counterfactual regret minimization (CFR) are currently the practical state of the art for computing a Nash equilibrium. We formalize an online learning setting in which the strategy space is not known to the agent. We give an efficient algorithm that achieves $O(T3/4)$ regret with high probability for that setting, even when the agent faces an adversarial environment.
arXiv Detail & Related papers (2021-03-08T04:03:24Z)
Provably Efficient Policy Gradient Methods for Two-Player Zero-Sum Markov Games [95.70078702838654]
This paper studies natural extensions of Natural Policy Gradient algorithm for solving two-player zero-sum games. We thoroughly characterize the algorithms' performance in terms of the number of samples, number of iterations, concentrability coefficients, and approximation error.
arXiv Detail & Related papers (2021-02-17T17:49:57Z)
Hindsight and Sequential Rationality of Correlated Play [18.176128899338433]
We look at algorithms that ensure strong performance in hindsight relative to what could have been achieved with modified behavior. We develop and advocate for this hindsight framing of learning in general sequential decision-making settings. We present examples illustrating the distinct strengths and weaknesses of each type of equilibrium in the literature.
arXiv Detail & Related papers (2020-12-10T18:30:21Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay. During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well. Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z)

This list is automatically generated from the titles and abstracts of the papers in this site.