Opponent Learning Awareness and Modelling in Multi-Objective Normal Form
Games
- URL: http://arxiv.org/abs/2011.07290v1
- Date: Sat, 14 Nov 2020 12:35:32 GMT
- Title: Opponent Learning Awareness and Modelling in Multi-Objective Normal Form
Games
- Authors: Roxana R\u{a}dulescu, Timothy Verstraeten, Yijie Zhang, Patrick
Mannion, Diederik M. Roijers, Ann Now\'e
- Abstract summary: It is essential for an agent to learn about the behaviour of other agents in the system.
We present the first study of the effects of such opponent modelling on multi-objective multi-agent interactions with non-linear utilities.
- Score: 5.0238343960165155
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Many real-world multi-agent interactions consider multiple distinct criteria,
i.e. the payoffs are multi-objective in nature. However, the same
multi-objective payoff vector may lead to different utilities for each
participant. Therefore, it is essential for an agent to learn about the
behaviour of other agents in the system. In this work, we present the first
study of the effects of such opponent modelling on multi-objective multi-agent
interactions with non-linear utilities. Specifically, we consider two-player
multi-objective normal form games with non-linear utility functions under the
scalarised expected returns optimisation criterion. We contribute novel
actor-critic and policy gradient formulations to allow reinforcement learning
of mixed strategies in this setting, along with extensions that incorporate
opponent policy reconstruction and learning with opponent learning awareness
(i.e., learning while considering the impact of one's policy when anticipating
the opponent's learning step). Empirical results in five different MONFGs
demonstrate that opponent learning awareness and modelling can drastically
alter the learning dynamics in this setting. When equilibria are present,
opponent modelling can confer significant benefits on agents that implement it.
When there are no Nash equilibria, opponent learning awareness and modelling
allows agents to still converge to meaningful solutions that approximate
equilibria.
Related papers
- A Minimaximalist Approach to Reinforcement Learning from Human Feedback [49.45285664482369]
We present Self-Play Preference Optimization (SPO), an algorithm for reinforcement learning from human feedback.
Our approach is minimalist in that it does not require training a reward model nor unstable adversarial training.
We demonstrate that on a suite of continuous control tasks, we are able to learn significantly more efficiently than reward-model based approaches.
arXiv Detail & Related papers (2024-01-08T17:55:02Z) - Mimicking To Dominate: Imitation Learning Strategies for Success in
Multiagent Competitive Games [13.060023718506917]
We develop a new multi-agent imitation learning model for predicting next moves of the opponents.
We also present a new multi-agent reinforcement learning algorithm that combines our imitation learning model and policy training into one single training process.
Experimental results show that our approach achieves superior performance compared to existing state-of-the-art multi-agent RL algorithms.
arXiv Detail & Related papers (2023-08-20T07:30:13Z) - Generating Personas for Games with Multimodal Adversarial Imitation
Learning [47.70823327747952]
Reinforcement learning has been widely successful in producing agents capable of playing games at a human level.
Going beyond reinforcement learning is necessary to model a wide range of human playstyles.
This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting.
arXiv Detail & Related papers (2023-08-15T06:58:19Z) - A Unifying Perspective on Multi-Calibration: Game Dynamics for
Multi-Objective Learning [63.20009081099896]
We provide a unifying framework for the design and analysis of multicalibrated predictors.
We exploit connections to game dynamics to achieve state-of-the-art guarantees for a diverse set of multicalibration learning problems.
arXiv Detail & Related papers (2023-02-21T18:24:17Z) - MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced
Active Learning [14.06682547001011]
State-of-the art methods typically focus on learning a single reward model.
We propose Multi-Objective Reinforced Active Learning (MORAL), a novel method for combining diverse demonstrations of social norms.
Our approach is able to interactively tune a deep RL agent towards a variety of preferences, while eliminating the need for computing multiple policies.
arXiv Detail & Related papers (2021-12-30T19:21:03Z) - Multi-Agent Imitation Learning with Copulas [102.27052968901894]
Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions.
In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems.
Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents.
arXiv Detail & Related papers (2021-07-10T03:49:41Z) - Policy Fusion for Adaptive and Customizable Reinforcement Learning
Agents [137.86426963572214]
We show how to combine distinct behavioral policies to obtain a meaningful "fusion" policy.
We propose four different policy fusion methods for combining pre-trained policies.
We provide several practical examples and use-cases for how these methods are indeed useful for video game production and designers.
arXiv Detail & Related papers (2021-04-21T16:08:44Z) - On Information Asymmetry in Competitive Multi-Agent Reinforcement
Learning: Convergence and Optimality [78.76529463321374]
We study the system of interacting non-cooperative two Q-learning agents.
We show that this information asymmetry can lead to a stable outcome of population learning.
arXiv Detail & Related papers (2020-10-21T11:19:53Z) - Learning to Model Opponent Learning [11.61673411387596]
Multi-Agent Reinforcement Learning (MARL) considers settings in which a set of coexisting agents interact with one another and their environment.
This poses a great challenge for value function-based algorithms whose convergence usually relies on the assumption of a stationary environment.
We develop a novel approach to modelling an opponent's learning dynamics which we term Learning to Model Opponent Learning (LeMOL)
arXiv Detail & Related papers (2020-06-06T17:19:04Z) - Variational Autoencoders for Opponent Modeling in Multi-Agent Systems [9.405879323049659]
Multi-agent systems exhibit complex behaviors that emanate from the interactions of multiple agents in a shared environment.
In this work, we are interested in controlling one agent in a multi-agent system and successfully learn to interact with the other agents that have fixed policies.
Modeling the behavior of other agents (opponents) is essential in understanding the interactions of the agents in the system.
arXiv Detail & Related papers (2020-01-29T13:38:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.