Learning Meta Representations for Agents in Multi-Agent Reinforcement
Learning
- URL: http://arxiv.org/abs/2108.12988v3
- Date: Mon, 5 Jun 2023 09:30:00 GMT
- Title: Learning Meta Representations for Agents in Multi-Agent Reinforcement
Learning
- Authors: Shenao Zhang, Li Shen, Lei Han, Li Shen
- Abstract summary: In multi-agent reinforcement learning, behaviors that agents learn in a single Markov Game (MG) are typically confined to the given agent number.
In this work, our focus is on creating agents that can generalize across population-varying MGs.
Instead of learning a unimodal policy, each agent learns a policy set comprising effective strategies across a variety of games.
- Score: 12.170248966278281
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In multi-agent reinforcement learning, the behaviors that agents learn in a
single Markov Game (MG) are typically confined to the given agent number. Every
single MG induced by varying the population may possess distinct optimal joint
strategies and game-specific knowledge, which are modeled independently in
modern multi-agent reinforcement learning algorithms. In this work, our focus
is on creating agents that can generalize across population-varying MGs.
Instead of learning a unimodal policy, each agent learns a policy set
comprising effective strategies across a variety of games. To achieve this, we
propose Meta Representations for Agents (MRA) that explicitly models the
game-common and game-specific strategic knowledge. By representing the policy
sets with multi-modal latent policies, the game-common strategic knowledge and
diverse strategic modes are discovered through an iterative optimization
procedure. We prove that by approximately maximizing the resulting constrained
mutual information objective, the policies can reach Nash Equilibrium in every
evaluation MG when the latent space is sufficiently large. When deploying MRA
in practical settings with limited latent space sizes, fast adaptation can be
achieved by leveraging the first-order gradient information. Extensive
experiments demonstrate the effectiveness of MRA in improving training
performance and generalization ability in challenging evaluation games.
Related papers
- From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Learning Strategy Representation for Imitation Learning in Multi-Agent Games [15.209555810145549]
We introduce the Strategy Representation for Learning (STRIL) framework, which effectively learns strategy representations in multi-agent games.
STRIL is a plug-in method that can be integrated into existing IL algorithms.
We demonstrate the effectiveness of STRIL across competitive multi-agent scenarios, including Two-player Pong, Limit Texas Hold'em, and Connect Four.
arXiv Detail & Related papers (2024-09-28T14:30:17Z) - Agent-Pro: Learning to Evolve via Policy-Level Reflection and Optimization [53.510942601223626]
Large Language Models (LLMs) exhibit robust problem-solving capabilities for diverse tasks.
These task solvers necessitate manually crafted prompts to inform task rules and regulate behaviors.
We propose Agent-Pro: an LLM-based Agent with Policy-level Reflection and Optimization.
arXiv Detail & Related papers (2024-02-27T15:09:20Z) - MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning [62.065503126104126]
We study how a principal can efficiently and effectively intervene on the rewards of a previously unseen learning agent in order to induce desirable outcomes.
This is relevant to many real-world settings like auctions or taxation, where the principal may not know the learning behavior nor the rewards of real people.
We introduce MERMAIDE, a model-based meta-learning framework to train a principal that can quickly adapt to out-of-distribution agents.
arXiv Detail & Related papers (2023-04-10T15:44:50Z) - Learning From Good Trajectories in Offline Multi-Agent Reinforcement
Learning [98.07495732562654]
offline multi-agent reinforcement learning (MARL) aims to learn effective multi-agent policies from pre-collected datasets.
One agent learned by offline MARL often inherits this random policy, jeopardizing the performance of the entire team.
We propose a novel framework called Shared Individual Trajectories (SIT) to address this problem.
arXiv Detail & Related papers (2022-11-28T18:11:26Z) - RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning [90.43925357575543]
We propose ranked policy memory ( RPM) to collect diverse multi-agent trajectories for training MARL policies with good generalizability.
RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks, and it significantly boosts the performance up to 402% on average.
arXiv Detail & Related papers (2022-10-18T07:32:43Z) - A Game-Theoretic Perspective of Generalization in Reinforcement Learning [9.402272029807316]
Generalization in reinforcement learning (RL) is of importance for real deployment of RL algorithms.
We propose a game-theoretic framework for the generalization in reinforcement learning, named GiRL.
arXiv Detail & Related papers (2022-08-07T06:17:15Z) - Pick Your Battles: Interaction Graphs as Population-Level Objectives for
Strategic Diversity [49.68758494467258]
We study how to construct diverse populations of agents by carefully structuring how individuals within a population interact.
Our approach is based on interaction graphs, which control the flow of information between agents during training.
We provide evidence for the importance of diversity in multi-agent training and analyse the effect of applying different interaction graphs on the training trajectories, diversity and performance of populations in a range of games.
arXiv Detail & Related papers (2021-10-08T11:29:52Z) - A Policy Gradient Algorithm for Learning to Learn in Multiagent
Reinforcement Learning [47.154539984501895]
We propose a novel meta-multiagent policy gradient theorem that accounts for the non-stationary policy dynamics inherent to multiagent learning settings.
This is achieved by modeling our gradient updates to consider both an agent's own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment.
arXiv Detail & Related papers (2020-10-31T22:50:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.