Learning Meta Representations for Agents in Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2108.12988v3
- Date: Mon, 5 Jun 2023 09:30:00 GMT
- Title: Learning Meta Representations for Agents in Multi-Agent Reinforcement
Learning
- Authors: Shenao Zhang, Li Shen, Lei Han, Li Shen
- Abstract summary: In multi-agent reinforcement learning, behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number.
In this work, our focus is on creating agents that can generalize across population-varying MGs.
Instead of learning a unimodal policy, each agent learns a policy set comprising effective strategies across a variety of games.
- Score: 12.170248966278281
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-agent reinforcement learning, the behaviors that agents learn in a
single Markov Game (MG) are typically confined to the given agent number. Every
single MG induced by varying the population may possess distinct optimal joint
strategies and game-specific knowledge, which are modeled independently in
modern multi-agent reinforcement learning algorithms. In this work, our focus
is on creating agents that can generalize across population-varying MGs.
Instead of learning a unimodal policy, each agent learns a policy set
comprising effective strategies across a variety of games. To achieve this, we
propose Meta Representations for Agents (MRA) that explicitly models the
game-common and game-specific strategic knowledge. By representing the policy
sets with multi-modal latent policies, the game-common strategic knowledge and
diverse strategic modes are discovered through an iterative optimization
procedure. We prove that by approximately maximizing the resulting constrained
mutual information objective, the policies can reach Nash Equilibrium in every
evaluation MG when the latent space is sufficiently large. When deploying MRA
in practical settings with limited latent space sizes, fast adaptation can be
achieved by leveraging the first-order gradient information. Extensive
experiments demonstrate the effectiveness of MRA in improving training
performance and generalization ability in challenging evaluation games.
Related papers
- Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization [53.510942601223626]
Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks.
These task solvers necessitate manually crafted prompts to inform task rules and regulate behaviors.
We propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization.
arXiv Detail & Related papers (2024-02-27T15:09:20Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Population-based Evaluation in Repeated Rock-Paper-Scissors as a
Benchmark for Multiagent Reinforcement Learning [14.37986882249142]
We propose a benchmark for multiagent learning based on repeated play of the simple game Rock, Paper, Scissors.
We describe metrics to measure the quality of agents based both on average returns and exploitability.
arXiv Detail & Related papers (2023-03-02T15:06:52Z) - Emergent Behaviors in Multi-Agent Target Acquisition [0.0]
We simulate a Multi-Agent System (MAS) using Reinforcement Learning (RL) in a pursuit-evasion game.
We create different adversarial scenarios by replacing RL-trained pursuers' policies with two distinct (non-RL) analytical strategies.
The novelty of our approach entails the creation of an influential feature set that reveals underlying data regularities.
arXiv Detail & Related papers (2022-12-15T15:20:58Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning [90.43925357575543]
We propose ranked policy memory ( RPM) to collect diverse multi-agent trajectories for training MARL policies with good generalizability.
RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks, and it significantly boosts the performance up to 402% on average.
arXiv Detail & Related papers (2022-10-18T07:32:43Z) - A Game-Theoretic Perspective of Generalization in Reinforcement Learning [9.402272029807316]
Generalization in reinforcement learning (RL) is of importance for real deployment of RL algorithms.
We propose a game-theoretic framework for the generalization in reinforcement learning, named GiRL.
arXiv Detail & Related papers (2022-08-07T06:17:15Z) - Conditional Imitation Learning for Multi-Agent Games [89.897635970366]
We study the problem of conditional multi-agent imitation learning, where we have access to joint trajectory demonstrations at training time.
We propose a novel approach to address the difficulties of scalability and data scarcity.
Our model learns a low-rank subspace over ego and partner agent strategies, then infers and adapts to a new partner strategy by interpolating in the subspace.
arXiv Detail & Related papers (2022-01-05T04:40:13Z) - Pick Your Battles: Interaction Graphs as Population-Level Objectives for
Strategic Diversity [49.68758494467258]
We study how to construct diverse populations of agents by carefully structuring how individuals within a population interact.
Our approach is based on interaction graphs, which control the flow of information between agents during training.
We provide evidence for the importance of diversity in multi-agent training and analyse the effect of applying different interaction graphs on the training trajectories, diversity and performance of populations in a range of games.
arXiv Detail & Related papers (2021-10-08T11:29:52Z) - A Policy Gradient Algorithm for Learning to Learn in Multiagent
Reinforcement Learning [47.154539984501895]
We propose a novel meta-multiagent policy gradient theorem that accounts for the non-stationary policy dynamics inherent to multiagent learning settings.
This is achieved by modeling our gradient updates to consider both an agent's own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment.
arXiv Detail & Related papers (2020-10-31T22:50:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.