Multi-Agent Training for Pommerman: Curriculum Learning and Population-based Self-Play Approach
- URL: http://arxiv.org/abs/2407.00662v1
- Date: Sun, 30 Jun 2024 11:14:29 GMT
- Title: Multi-Agent Training for Pommerman: Curriculum Learning and Population-based Self-Play Approach
- Authors: Nhat-Minh Huynh, Hoang-Giang Cao, I-Chen Wu,
- Abstract summary: Pommerman is an ideal benchmark for multi-agent training, providing a battleground for two teams with communication capabilities among allied agents.
This study introduces a system designed to train multi-agent systems to play Pommerman using a combination of curriculum learning and population-based self-play.
- Score: 11.740631954398292
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Pommerman is a multi-agent environment that has received considerable attention from researchers in recent years. This environment is an ideal benchmark for multi-agent training, providing a battleground for two teams with communication capabilities among allied agents. Pommerman presents significant challenges for model-free reinforcement learning due to delayed action effects, sparse rewards, and false positives, where opponent players can lose due to their own mistakes. This study introduces a system designed to train multi-agent systems to play Pommerman using a combination of curriculum learning and population-based self-play. We also tackle two challenging problems when deploying the multi-agent training system for competitive games: sparse reward and suitable matchmaking mechanism. Specifically, we propose an adaptive annealing factor based on agents' performance to adjust the dense exploration reward during training dynamically. Additionally, we implement a matchmaking mechanism utilizing the Elo rating system to pair agents effectively. Our experimental results demonstrate that our trained agent can outperform top learning agents without requiring communication among allied agents.
Related papers
- Toward Optimal LLM Alignments Using Two-Player Games [86.39338084862324]
In this paper, we investigate alignment through the lens of two-agent games, involving iterative interactions between an adversarial and a defensive agent.
We theoretically demonstrate that this iterative reinforcement learning optimization converges to a Nash Equilibrium for the game induced by the agents.
Experimental results in safety scenarios demonstrate that learning in such a competitive environment not only fully trains agents but also leads to policies with enhanced generalization capabilities for both adversarial and defensive agents.
arXiv Detail & Related papers (2024-06-16T15:24:50Z) - ProAgent: Building Proactive Cooperative Agents with Large Language
Models [89.53040828210945]
ProAgent is a novel framework that harnesses large language models to create proactive agents.
ProAgent can analyze the present state, and infer the intentions of teammates from observations.
ProAgent exhibits a high degree of modularity and interpretability, making it easily integrated into various coordination scenarios.
arXiv Detail & Related papers (2023-08-22T10:36:56Z) - Generating Personas for Games with Multimodal Adversarial Imitation
Learning [47.70823327747952]
Reinforcement learning has been widely successful in producing agents capable of playing games at a human level.
Going beyond reinforcement learning is necessary to model a wide range of human playstyles.
This paper presents a novel imitation learning approach to generate multiple persona policies for playtesting.
arXiv Detail & Related papers (2023-08-15T06:58:19Z) - ELIGN: Expectation Alignment as a Multi-Agent Intrinsic Reward [29.737986509769808]
We propose a self-supervised intrinsic reward ELIGN - expectation alignment.
Similar to how animals collaborate in a decentralized manner with those in their vicinity, agents trained with expectation alignment learn behaviors that match their neighbors' expectations.
We show that agent coordination improves through expectation alignment because agents learn to divide tasks amongst themselves, break coordination symmetries, and confuse adversaries.
arXiv Detail & Related papers (2022-10-09T22:24:44Z) - Exploring the Benefits of Teams in Multiagent Learning [5.334505575267924]
We propose a new model of multiagent teams for reinforcement learning (RL) agents inspired by organizational psychology (OP)
We find that agents divided into teams develop cooperative pro-social policies despite incentives to not cooperate.
Agents are better able to coordinate and learn emergent roles within their teams and achieve higher rewards compared to when the interests of all agents are aligned.
arXiv Detail & Related papers (2022-05-04T21:14:03Z) - Coach-assisted Multi-Agent Reinforcement Learning Framework for
Unexpected Crashed Agents [120.91291581594773]
We present a formal formulation of a cooperative multi-agent reinforcement learning system with unexpected crashes.
We propose a coach-assisted multi-agent reinforcement learning framework, which introduces a virtual coach agent to adjust the crash rate during training.
To the best of our knowledge, this work is the first to study the unexpected crashes in the multi-agent system.
arXiv Detail & Related papers (2022-03-16T08:22:45Z) - Two-stage training algorithm for AI robot soccer [2.0757564643017092]
Two-stage heterogeneous centralized training is proposed to improve the learning performance of heterogeneous agents.
The proposed method is applied to 5 versus 5 AI robot soccer for validation.
arXiv Detail & Related papers (2021-04-13T04:24:13Z) - Multi-Agent Collaboration via Reward Attribution Decomposition [75.36911959491228]
We propose Collaborative Q-learning (CollaQ) that achieves state-of-the-art performance in the StarCraft multi-agent challenge.
CollaQ is evaluated on various StarCraft Attribution maps and shows that it outperforms existing state-of-the-art techniques.
arXiv Detail & Related papers (2020-10-16T17:42:11Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.