Learning to Model Opponent Learning
- URL: http://arxiv.org/abs/2006.03923v1
- Date: Sat, 6 Jun 2020 17:19:04 GMT
- Title: Learning to Model Opponent Learning
- Authors: Ian Davies, Zheng Tian and Jun Wang
- Abstract summary: Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment.
This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment.
We develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL)
- Score: 11.61673411387596
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-Agent Reinforcement Learning (MARL) considers settings in which a set
of coexisting agents interact with one another and their environment. The
adaptation and learning of other agents induces non-stationarity in the
environment dynamics. This poses a great challenge for value function-based
algorithms whose convergence usually relies on the assumption of a stationary
environment. Policy search algorithms also struggle in multi-agent settings as
the partial observability resulting from an opponent's actions not being known
introduces high variance to policy training. Modelling an agent's opponent(s)
is often pursued as a means of resolving the issues arising from the
coexistence of learning opponents. An opponent model provides an agent with
some ability to reason about other agents to aid its own decision making. Most
prior works learn an opponent model by assuming the opponent is employing a
stationary policy or switching between a set of stationary policies. Such an
approach can reduce the variance of training signals for policy search
algorithms. However, in the multi-agent setting, agents have an incentive to
continually adapt and learn. This means that the assumptions concerning
opponent stationarity are unrealistic. In this work, we develop a novel
approach to modelling an opponent's learning dynamics which we term Learning to
Model Opponent Learning (LeMOL). We show our structured opponent model is more
accurate and stable than naive behaviour cloning baselines. We further show
that opponent modelling can improve the performance of algorithmic agents in
multi-agent settings.
Related papers
- Efficient Adaptation in Mixed-Motive Environments via Hierarchical Opponent Modeling and Planning [51.52387511006586]
We propose Hierarchical Opponent modeling and Planning (HOP), a novel multi-agent decision-making algorithm.
HOP is hierarchically composed of two modules: an opponent modeling module that infers others' goals and learns corresponding goal-conditioned policies.
HOP exhibits superior few-shot adaptation capabilities when interacting with various unseen agents, and excels in self-play scenarios.
arXiv Detail & Related papers (2024-06-12T08:48:06Z) - A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback.
Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training.
We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z) - Contrastive learning-based agent modeling for deep reinforcement
learning [31.293496061727932]
Agent modeling is essential when designing adaptive policies for intelligent machine agents in multiagent systems.
We devised a Contrastive Learning-based Agent Modeling (CLAM) method that relies only on the local observations from the ego agent during training and execution.
CLAM is capable of generating consistent high-quality policy representations in real-time right from the beginning of each episode.
arXiv Detail & Related papers (2023-12-30T03:44:12Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - Multi-agent Actor-Critic with Time Dynamical Opponent Model [16.820873906787906]
In multi-agent reinforcement learning, multiple agents learn simultaneously while interacting with a common environment and each other.
We propose a novel textitTime Dynamical Opponent Model (TDOM) to encode the knowledge that the opponent policies tend to improve over time.
We show empirically that TDOM achieves superior opponent behavior prediction during test time.
arXiv Detail & Related papers (2022-04-12T07:16:15Z) - Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time.
We propose a novel approach to address the difficulties of scalability and data scarcity.
Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z) - Deep Interactive Bayesian Reinforcement Learning via Meta-Learning [63.96201773395921]
The optimal adaptive behaviour under uncertainty over the other agents' strategies can be computed using the Interactive Bayesian Reinforcement Learning framework.
We propose to meta-learn approximate belief inference and Bayes-optimal behaviour for a given prior.
We show empirically that our approach outperforms existing methods that use a model-free approach, sample from the approximate posterior, maintain memory-free models of others, or do not fully utilise the known structure of the environment.
arXiv Detail & Related papers (2021-01-11T13:25:13Z) - Opponent Learning Awareness and Modelling in Multi-Objective Normal Form
Games [5.0238343960165155]
It is essential for an agent to learn about the behaviour of other agents in the system.
We present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities.
arXiv Detail & Related papers (2020-11-14T12:35:32Z) - Learning Latent Representations to Influence Multi-Agent Interaction [65.44092264843538]
We propose a reinforcement learning-based framework for learning latent representations of an agent's policy.
We show that our approach outperforms the alternatives and learns to influence the other agent.
arXiv Detail & Related papers (2020-11-12T19:04:26Z) - Variational Autoencoders for Opponent Modeling in Multi-Agent Systems [9.405879323049659]
Multi-agent systems exhibit complex behaviors that emanate from the interactions of multiple agents in a shared environment.
In this work, we are interested in controlling one agent in a multi-agent system and successfully learn to interact with the other agents that have fixed policies.
Modeling the behavior of other agents (opponents) is essential in understanding the interactions of the agents in the system.
arXiv Detail & Related papers (2020-01-29T13:38:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.