Related papers: Consistent Opponent Modeling of Static Opponents in Imperfect-Information Games

Consistent Opponent Modeling of Static Opponents in Imperfect-Information Games

URL: http://arxiv.org/abs/2508.17671v3
Date: Wed, 08 Oct 2025 03:32:46 GMT
Title: Consistent Opponent Modeling of Static Opponents in Imperfect-Information Games
Authors: Sam Ganzfried,
Abstract summary: We show that existing opponent modeling approaches fail to satisfy a simple desirable property even against static opponents drawn from a known prior distribution.<n>We develop a new algorithm that is able to achieve this property and runs efficiently by solving a convex problem based on the sequence-form game representation.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The goal of agents in multi-agent environments is to maximize total reward against the opposing agents that are encountered. Following a game-theoretic solution concept, such as Nash equilibrium, may obtain a strong performance in some settings; however, such approaches fail to capitalize on historical and observed data from repeated interactions against our opponents. Opponent modeling algorithms integrate machine learning techniques to exploit suboptimal opponents utilizing available data; however, the effectiveness of such approaches in imperfect-information games to date is quite limited. We show that existing opponent modeling approaches fail to satisfy a simple desirable property even against static opponents drawn from a known prior distribution; namely, they do not guarantee that the model approaches the opponent's true strategy even in the limit as the number of game iterations approaches infinity. We develop a new algorithm that is able to achieve this property and runs efficiently by solving a convex minimization problem based on the sequence-form game representation using projected gradient descent. The algorithm is guaranteed to efficiently converge to the opponent's true strategy given observations from gameplay and possibly additional historical data if it is available.

Related papers

Robust Optimization with Diffusion Models for Green Security [49.68562792424776]
In green security, defenders must forecast adversarial behavior, such as poaching, illegal logging, and illegal fishing, to plan effective patrols.<n>We propose a conditional diffusion model for adversary behavior modeling, leveraging its strong distribution-fitting capabilities.<n>We introduce a mixed strategy of mixed strategies and employ a twisted Sequential Monte Carlo (SMC) sampler for accurate sampling.
arXiv Detail & Related papers (2025-02-19T05:30:46Z)
A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback. Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training. We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z)
Unbalanced Optimal Transport: A Unified Framework for Object Detection [97.74382560746987]
We show how Unbalanced Optimal Transport unifies different approaches to object detection. We show that training an object detection model with Unbalanced Optimal Transport is able to reach the state-of-the-art. The approach is well suited for GPU implementation, which proves to be an advantage for large-scale models.
arXiv Detail & Related papers (2023-07-05T16:21:52Z)
Towards Optimal Randomized Strategies in Adversarial Example Game [13.287949447721115]
The vulnerability of deep neural network models to adversarial example attacks is a practical challenge in many artificial intelligence applications. We propose the first algorithm of its kind, called FRAT, which models the problem with a new infinite-dimensional continuous-time flow on probability distribution spaces. We prove that the continuous-time limit of FRAT converges to a mixed Nash equilibria in a zero-sum game formed by a defender and an attacker.
arXiv Detail & Related papers (2023-06-29T07:29:23Z)
Finding mixed-strategy equilibria of continuous-action games without gradients using randomized policy networks [83.28949556413717]
We study the problem of computing an approximate Nash equilibrium of continuous-action game without access to gradients. We model players' strategies using artificial neural networks. This paper is the first to solve general continuous-action games with unrestricted mixed strategies and without any gradient information.
arXiv Detail & Related papers (2022-11-29T05:16:41Z)
Optimal control of robust team stochastic games [5.425935258756356]
We propose a model of "robust" team games, where players utilize a robust optimization approach to make decisions. We develop a learning algorithm in the form of a Gauss-Seidel modified policy iteration and prove its convergence. Some numerical simulations are presented to demonstrate the effectiveness of the algorithm.
arXiv Detail & Related papers (2021-05-16T10:42:09Z)
Automated Decision-based Adversarial Attacks [48.01183253407982]
We consider the practical and challenging decision-based black-box adversarial setting. Under this setting, the attacker can only acquire the final classification labels by querying the target model. We propose to automatically discover decision-based adversarial attack algorithms.
arXiv Detail & Related papers (2021-05-09T13:15:10Z)
Efficient Competitive Self-Play Policy Optimization [20.023522000925094]
We propose a new algorithmic framework for competitive self-play reinforcement learning in two-player zero-sum games. Our method trains several agents simultaneously, and intelligently takes each other as opponent based on simple adversarial rules. We prove theoretically that our algorithm converges to an approximate equilibrium with high probability in convex-concave games.
arXiv Detail & Related papers (2020-09-13T21:01:38Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.