Related papers: BRExIt: On Opponent Modelling in Expert Iteration

BRExIt: On Opponent Modelling in Expert Iteration

URL: http://arxiv.org/abs/2206.00113v2
Date: Tue, 25 Apr 2023 15:31:34 GMT
Title: BRExIt: On Opponent Modelling in Expert Iteration
Authors: Daniel Hernandez, Hendrik Baier, Michael Kaisers
Abstract summary: We propose Best Response Expert Iteration (BRExIt), which accelerates learning in games by incorporating opponent models into the state-of-the-art learning algorithm Expert Iteration (ExIt) BRExIt aims to (1) improve feature shaping in the apprentice, with a policy head predicting opponent policies as an auxiliary task, and (2) bias opponent moves in planning towards the given or learnt opponent model, to generate apprentice targets that better approximate a best response.
Score: 7.088503833248158
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Finding a best response policy is a central objective in game theory and multi-agent learning, with modern population-based training approaches employing reinforcement learning algorithms as best-response oracles to improve play against candidate opponents (typically previously learnt policies). We propose Best Response Expert Iteration (BRExIt), which accelerates learning in games by incorporating opponent models into the state-of-the-art learning algorithm Expert Iteration (ExIt). BRExIt aims to (1) improve feature shaping in the apprentice, with a policy head predicting opponent policies as an auxiliary task, and (2) bias opponent moves in planning towards the given or learnt opponent model, to generate apprentice targets that better approximate a best response. In an empirical ablation on BRExIt's algorithmic variants against a set of fixed test agents, we provide statistical evidence that BRExIt learns better performing policies than ExIt.

Related papers

Preference-based opponent shaping in differentiable games [3.373994463906893]
We propose a novel Preference-based Opponent Shaping (PBOS) method to enhance the strategy learning process by shaping agents' preferences towards cooperation. We verify the performance of PBOS algorithm in a variety of differentiable games.
arXiv Detail & Related papers (2024-12-04T06:49:21Z)
In-Context Exploiter for Extensive-Form Games [38.24471816329584]
We introduce a novel method, In-Context Exploiter (ICE), to train a single model that can act as any player in the game and adaptively exploit opponents entirely by in-context learning. Our ICE algorithm involves generating diverse opponent strategies, collecting interactive history data by a reinforcement learning algorithm, and training a transformer-based agent within a well-designed curriculum learning framework.
arXiv Detail & Related papers (2024-08-10T14:59:09Z)
Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent. We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents. Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z)
All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization [57.615269148301515]
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time. We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them. Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
arXiv Detail & Related papers (2023-10-02T08:11:07Z)
Learning Diverse Risk Preferences in Population-based Self-play [23.07952140353786]
Current self-play algorithms optimize the agent to maximize expected win-rates against its current or historical copies. We introduce diversity from the perspective that agents could have diverse risk preferences in the face of uncertainty. We show that our method achieves comparable or superior performance in competitive games.
arXiv Detail & Related papers (2023-05-19T06:56:02Z)
Population-based Evaluation in Repeated Rock-Paper-Scissors as a Benchmark for Multiagent Reinforcement Learning [14.37986882249142]
We propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors. We describe metrics to measure the quality of agents based both on average returns and exploitability.
arXiv Detail & Related papers (2023-03-02T15:06:52Z)
Combining Deep Reinforcement Learning and Search with Generative Models for Game-Theoretic Opponent Modeling [30.465929764202155]
We introduce a scalable and generic multiagent training regime for opponent modeling using deep game-theoretic reinforcement learning.<n>We first propose Generative Best Respoonse (GenBR), a best response algorithm based on Monte-Carlo Tree Search (MCTS)<n>We use this new method under the framework of Policy Space Response Oracles (PSRO) to automate the generation of an emphoffline opponent model.
arXiv Detail & Related papers (2023-02-01T23:06:23Z)
L2E: Learning to Exploit Your Opponent [66.66334543946672]
We propose a novel Learning to Exploit framework for implicit opponent modeling. L2E acquires the ability to exploit opponents by a few interactions with different opponents during training. We propose a novel opponent strategy generation algorithm that produces effective opponents for training automatically.
arXiv Detail & Related papers (2021-02-18T14:27:59Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
Efficient Use of heuristics for accelerating XCS-based Policy Learning in Markov Games [9.038065438586065]
In games, playing against non-stationary opponents with learning ability is still challenging for reinforcement learning agents. This paper proposes efficient use of rough papers to speed up policy learning when playing against concurrent learners.
arXiv Detail & Related papers (2020-05-26T07:47:27Z)
Learning from Learners: Adapting Reinforcement Learning Agents to be Competitive in a Card Game [71.24825724518847]
We present a study on how popular reinforcement learning algorithms can be adapted to learn and to play a real-world implementation of a competitive multiplayer card game. We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style.
arXiv Detail & Related papers (2020-04-08T14:11:05Z)
Guided Dialog Policy Learning without Adversarial Learning in the Loop [103.20723982440788]
A number of adversarial learning methods have been proposed to learn the reward function together with the dialogue policy. We propose to decompose the adversarial training into two steps. First, we train the discriminator with an auxiliary dialogue generator and then incorporate a derived reward model into a common RL method to guide the dialogue policy learning.
arXiv Detail & Related papers (2020-04-07T11:03:17Z)

This list is automatically generated from the titles and abstracts of the papers in this site.