Related papers: All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization

All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization

URL: http://arxiv.org/abs/2310.00964v1
Date: Mon, 2 Oct 2023 08:11:07 GMT
Title: All by Myself: Learning Individualized Competitive Behaviour with a Contrastive Reinforcement Learning optimization
Authors: Pablo Barros, Alessandra Sciutti
Abstract summary: In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time. We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them. Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
Score: 57.615269148301515
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time. Besides dealing with the increased dynamics of the scenarios due to the opponents' actions, they usually have to understand how to overcome the opponent's strategies. Most of the common solutions, usually based on continual learning or centralized multi-agent experiences, however, do not allow the development of personalized strategies to face individual opponents. In this paper, we propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them. The entire model is trained online, using a composed loss based on a contrastive optimization, to learn competitive and multiplayer games. We evaluate our model on a pokemon duel scenario and the four-player competitive Chef's Hat card game. Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times. We also present a discussion on the impact of our model, in particular on how well it deals with on specific strategy learning for each of the two scenarios.

Related papers

Opponent Modeling in Multiplayer Imperfect-Information Games [1.024113475677323]
We present an approach for opponent modeling in multiplayer imperfect-information games. We run experiments against a variety of real opponents and exact Nash equilibrium strategies in three-player Kuhn poker. Our algorithm significantly outperforms all of the agents, including the exact Nash equilibrium strategies.
arXiv Detail & Related papers (2022-12-12T16:48:53Z)
Model-Based Opponent Modeling [20.701733377216932]
We propose model-based opponent modeling (MBOM), which employs the environment model to adapt to all kinds of opponent. MBOM achieves more effective adaptation than existing methods in competitive and cooperative environments.
arXiv Detail & Related papers (2021-08-04T04:42:43Z)
Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition [88.26752130107259]
In real-world multiagent systems, agents with different capabilities may join or leave without altering the team's overarching goals. We propose COPA, a coach-player framework to tackle this problem. We 1) adopt the attention mechanism for both the coach and the players; 2) propose a variational objective to regularize learning; and 3) design an adaptive communication method to let the coach decide when to communicate with the players.
arXiv Detail & Related papers (2021-05-18T17:27:37Z)
L2E: Learning to Exploit Your Opponent [66.66334543946672]
We propose a novel Learning to Exploit framework for implicit opponent modeling. L2E acquires the ability to exploit opponents by a few interactions with different opponents during training. We propose a novel opponent strategy generation algorithm that produces effective opponents for training automatically.
arXiv Detail & Related papers (2021-02-18T14:27:59Z)
Moody Learners -- Explaining Competitive Behaviour of Reinforcement Learning Agents [65.2200847818153]
In a competitive scenario, the agent does not only have a dynamic environment but also is directly affected by the opponents' actions. Observing the Q-values of the agent is usually a way of explaining its behavior, however, do not show the temporal-relation between the selected actions.
arXiv Detail & Related papers (2020-07-30T11:30:42Z)
Learning to Play Sequential Games versus Unknown Opponents [93.8672371143881]
We consider a repeated sequential game between a learner, who plays first, and an opponent who responds to the chosen action. We propose a novel algorithm for the learner when playing against an adversarial sequence of opponents. Our results include algorithm's regret guarantees that depend on the regularity of the opponent's response.
arXiv Detail & Related papers (2020-07-10T09:33:05Z)
Learning from Learners: Adapting Reinforcement Learning Agents to be Competitive in a Card Game [71.24825724518847]
We present a study on how popular reinforcement learning algorithms can be adapted to learn and to play a real-world implementation of a competitive multiplayer card game. We propose specific training and validation routines for the learning agents, in order to evaluate how the agents learn to be competitive and explain how they adapt to each others' playing style.
arXiv Detail & Related papers (2020-04-08T14:11:05Z)
Deep Reinforcement Learning for FlipIt Security Game [2.0624765454705654]
We describe a deep learning model in which agents adapt to different classes of opponents and learn the optimal counter-strategy. We apply our model to FlipIt, a two-player security game in which both players, the attacker and the defender, compete for ownership of a shared resource. Our model is a deep neural network combined with Q-learning and is trained to maximize the defender's time of ownership of the resource.
arXiv Detail & Related papers (2020-02-28T18:26:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.