Model-Based Opponent Modeling
- URL: http://arxiv.org/abs/2108.01843v1
- Date: Wed, 4 Aug 2021 04:42:43 GMT
- Title: Model-Based Opponent Modeling
- Authors: Xiaopeng Yu, Jiechuan Jiang, Haobin Jiang, and Zongqing Lu
- Abstract summary: We propose model-based opponent modeling (MBOM), which employs the environment model to adapt to all kinds of opponent.
MBOM achieves more effective adaptation than existing methods in competitive and cooperative environments.
- Score: 20.701733377216932
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: When one agent interacts with a multi-agent environment, it is challenging to
deal with various opponents unseen before. Modeling the behaviors, goals, or
beliefs of opponents could help the agent adjust its policy to adapt to
different opponents. In addition, it is also important to consider opponents
who are learning simultaneously or capable of reasoning. However, existing work
usually tackles only one of the aforementioned types of opponent. In this
paper, we propose model-based opponent modeling (MBOM), which employs the
environment model to adapt to all kinds of opponent. MBOM simulates the
recursive reasoning process in the environment model and imagines a set of
improving opponent policies. To effectively and accurately represent the
opponent policy, MBOM further mixes the imagined opponent policies according to
the similarity with the real behaviors of opponents. Empirically, we show that
MBOM achieves more effective adaptation than existing methods in competitive
and cooperative environments, respectively with different types of opponent,
i.e., fixed policy, na\"ive learner, and reasoning learner.
Related papers
- Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback.
Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training.
We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z) - All by Myself: Learning Individualized Competitive Behaviour with a
Contrastive Reinforcement Learning optimization [57.615269148301515]
In a competitive game scenario, a set of agents have to learn decisions that maximize their goals and minimize their adversaries' goals at the same time.
We propose a novel model composed of three neural layers that learn a representation of a competitive game, learn how to map the strategy of specific opponents, and how to disrupt them.
Our experiments demonstrate that our model achieves better performance when playing against offline, online, and competitive-specific models, in particular when playing against the same opponent multiple times.
arXiv Detail & Related papers (2023-10-02T08:11:07Z) - Decision-making with Speculative Opponent Models [10.594910251058087]
We introduce Distributional Opponent-aided Multi-agent Actor-Critic (DOMAC)
DOMAC is the first speculative opponent modelling algorithm that relies solely on local information (i.e., the controlled agent's observations, actions, and rewards)
arXiv Detail & Related papers (2022-11-22T01:29:47Z) - Safe adaptation in multiagent competition [48.02377041620857]
In multiagent competitive scenarios, ego-agents may have to adapt to new opponents with previously unseen behaviors.
As the ego-agent updates its own behavior to exploit the opponent, its own behavior could become more exploitable.
We develop a safe adaptation approach in which the ego-agent is trained against a regularized opponent model.
arXiv Detail & Related papers (2022-03-14T23:53:59Z) - L2E: Learning to Exploit Your Opponent [66.66334543946672]
We propose a novel Learning to Exploit framework for implicit opponent modeling.
L2E acquires the ability to exploit opponents by a few interactions with different opponents during training.
We propose a novel opponent strategy generation algorithm that produces effective opponents for training automatically.
arXiv Detail & Related papers (2021-02-18T14:27:59Z) - Opponent Learning Awareness and Modelling in Multi-Objective Normal Form
Games [5.0238343960165155]
It is essential for an agent to learn about the behaviour of other agents in the system.
We present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities.
arXiv Detail & Related papers (2020-11-14T12:35:32Z) - Moody Learners -- Explaining Competitive Behaviour of Reinforcement
Learning Agents [65.2200847818153]
In a competitive scenario, the agent does not only have a dynamic environment but also is directly affected by the opponents' actions.
Observing the Q-values of the agent is usually a way of explaining its behavior, however, do not show the temporal-relation between the selected actions.
arXiv Detail & Related papers (2020-07-30T11:30:42Z) - Learning to Model Opponent Learning [11.61673411387596]
Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment.
This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment.
We develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL)
arXiv Detail & Related papers (2020-06-06T17:19:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.