Learning to Play Sequential Games versus Unknown Opponents
- URL: http://arxiv.org/abs/2007.05271v1
- Date: Fri, 10 Jul 2020 09:33:05 GMT
- Title: Learning to Play Sequential Games versus Unknown Opponents
- Authors: Pier Giuseppe Sessa, Ilija Bogunovic, Maryam Kamgarpour, Andreas
Krause
- Abstract summary: We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action.
We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents.
Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
- Score: 93.8672371143881
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a repeated sequential game between a learner, who plays first,
and an opponent who responds to the chosen action. We seek to design strategies
for the learner to successfully interact with the opponent. While most previous
approaches consider known opponent models, we focus on the setting in which the
opponent's model is unknown. To this end, we use kernel-based regularity
assumptions to capture and exploit the structure in the opponent's response. We
propose a novel algorithm for the learner when playing against an adversarial
sequence of opponents. The algorithm combines ideas from bilevel optimization
and online learning to effectively balance between exploration (learning about
the opponent's model) and exploitation (selecting highly rewarding actions for
the learner). Our results include algorithm's regret guarantees that depend on
the regularity of the opponent's response and scale sublinearly with the number
of game rounds. Moreover, we specialize our approach to repeated Stackelberg
games, and empirically demonstrate its effectiveness in a traffic routing and
wildlife conservation task
Related papers
- In-Context Exploiter for Extensive-Form Games [38.24471816329584]
We introduce a novel method, In-Context Exploiter (ICE), to train a single model that can act as any player in the game and adaptively exploit opponents entirely by in-context learning.
Our ICE algorithm involves generating diverse opponent strategies, collecting interactive history data by a reinforcement learning algorithm, and training a transformer-based agent within a well-designed curriculum learning framework.
arXiv Detail & Related papers (2024-08-10T14:59:09Z) - All by Myself: Learning Individualized Competitive Behaviour with a
Contrastive Reinforcement Learning optimization [57.615269148301515]
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time.
We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them.
Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
arXiv Detail & Related papers (2023-10-02T08:11:07Z) - No-Regret Learning in Time-Varying Zero-Sum Games [99.86860277006318]
Learning from repeated play in a fixed zero-sum game is a classic problem in game theory and online learning.
We develop a single parameter-free algorithm that simultaneously enjoys favorable guarantees under three performance measures.
Our algorithm is based on a two-layer structure with a meta-algorithm learning over a group of black-box base-learners satisfying a certain property.
arXiv Detail & Related papers (2022-01-30T06:10:04Z) - Game Theory for Adversarial Attacks and Defenses [0.0]
Adrial attacks can generate adversarial inputs by applying small but intentionally worst-case perturbations to samples from the dataset.
Some adversarial defense techniques are developed to improve the security and robustness of the models and avoid them being attacked.
arXiv Detail & Related papers (2021-10-08T07:38:33Z) - L2E: Learning to Exploit Your Opponent [66.66334543946672]
We propose a novel Learning to Exploit framework for implicit opponent modeling.
L2E acquires the ability to exploit opponents by a few interactions with different opponents during training.
We propose a novel opponent strategy generation algorithm that produces effective opponents for training automatically.
arXiv Detail & Related papers (2021-02-18T14:27:59Z) - Learning in two-player games between transparent opponents [0.0]
We consider a scenario in which two reinforcement learning agents repeatedly play a matrix game against each other.
The agents' decision-making is transparent to each other, which allows each agent to predict how their opponent will play against them.
We find that the combination of mutually transparent decision-making and opponent-aware learning robustly leads to mutual cooperation in a single-shot prisoner's dilemma.
arXiv Detail & Related papers (2020-12-04T15:41:07Z) - Enhanced Rolling Horizon Evolution Algorithm with Opponent Model
Learning: Results for the Fighting Game AI Competition [9.75720700239984]
We propose a novel algorithm that combines Rolling Horizon Evolution Algorithm (RHEA) with opponent model learning.
Our proposed bot with the policy-gradient-based opponent model is the only one without using Monte-Carlo Tree Search (MCTS) among top five bots in the 2019 competition.
arXiv Detail & Related papers (2020-03-31T04:44:33Z) - Efficient exploration of zero-sum stochastic games [83.28949556413717]
We investigate the increasingly important and common game-solving setting where we do not have an explicit description of the game but only oracle access to it through gameplay.
During a limited-duration learning phase, the algorithm can control the actions of both players in order to try to learn the game and how to play it well.
Our motivation is to quickly learn strategies that have low exploitability in situations where evaluating the payoffs of a queried strategy profile is costly.
arXiv Detail & Related papers (2020-02-24T20:30:38Z) - Provable Self-Play Algorithms for Competitive Reinforcement Learning [48.12602400021397]
We study self-play in competitive reinforcement learning under the setting of Markov games.
We show that a self-play algorithm achieves regret $tildemathcalO(sqrtT)$ after playing $T$ steps of the game.
We also introduce an explore-then-exploit style algorithm, which achieves a slightly worse regret $tildemathcalO(T2/3)$, but is guaranteed to run in time even in the worst case.
arXiv Detail & Related papers (2020-02-10T18:44:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.