Related papers: Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with Subgame Curriculum Learning

Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with Subgame Curriculum Learning

URL: http://arxiv.org/abs/2310.04796v3
Date: Sat, 16 Dec 2023 06:18:23 GMT
Title: Accelerate Multi-Agent Reinforcement Learning in Zero-Sum Games with Subgame Curriculum Learning
Authors: Jiayu Chen, Zelai Xu, Yunfei Li, Chao Yu, Jiaming Song, Huazhong Yang, Fei Fang, Yu Wang, Yi Wu
Abstract summary: We present a novel subgame curriculum learning framework for zero-sum games. It adopts an adaptive initial state distribution by resetting agents to some previously visited states. We derive a subgame selection metric that approximates the squared distance to NE values.
Score: 65.36326734799587
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Learning Nash equilibrium (NE) in complex zero-sum games with multi-agent reinforcement learning (MARL) can be extremely computationally expensive. Curriculum learning is an effective way to accelerate learning, but an under-explored dimension for generating a curriculum is the difficulty-to-learn of the subgames -- games induced by starting from a specific state. In this work, we present a novel subgame curriculum learning framework for zero-sum games. It adopts an adaptive initial state distribution by resetting agents to some previously visited states where they can quickly learn to improve performance. Building upon this framework, we derive a subgame selection metric that approximates the squared distance to NE values and further adopt a particle-based state sampler for subgame generation. Integrating these techniques leads to our new algorithm, Subgame Automatic Curriculum Learning (SACL), which is a realization of the subgame curriculum learning framework. SACL can be combined with any MARL algorithm such as MAPPO. Experiments in the particle-world environment and Google Research Football environment show SACL produces much stronger policies than baselines. In the challenging hide-and-seek quadrant environment, SACL produces all four emergent stages and uses only half the samples of MAPPO with self-play. The project website is at https://sites.google.com/view/sacl-rl.

Related papers

Syllabus: Portable Curricula for Reinforcement Learning Agents [21.20246467152236]
Syllabus is a portable curriculum learning library.<n>It provides a universal API for curriculum learning, modular implementations of popular automatic curriculum learning methods, and infrastructure.<n>We present the first examples of automatic curriculum learning in NetHack and Neural MMO, two of the most challenging RL benchmarks.
arXiv Detail & Related papers (2024-11-18T06:22:30Z)
Autoverse: An Evolvable Game Language for Learning Robust Embodied Agents [2.624282086797512]
We introduce Autoverse, an evolvable, domain-specific language for single-player 2D grid-based games. We demonstrate its use as a scalable training ground for Open-Ended Learning (OEL) algorithms.
arXiv Detail & Related papers (2024-07-05T02:18:02Z)
Neural Population Learning beyond Symmetric Zero-sum Games [52.20454809055356]
We introduce NeuPL-JPSRO, a neural population learning algorithm that benefits from transfer learning of skills and converges to a Coarse Correlated (CCE) of the game. Our work shows that equilibrium convergent population learning can be implemented at scale and in generality.
arXiv Detail & Related papers (2024-01-10T12:56:24Z)
SPRING: Studying the Paper and Reasoning to Play Games [102.5587155284795]
We propose a novel approach, SPRING, to read the game's original academic paper and use the knowledge learned to reason and play the game through a large language model (LLM) In experiments, we study the quality of in-context "reasoning" induced by different forms of prompts under the setting of the Crafter open-world environment. Our experiments suggest that LLMs, when prompted with consistent chain-of-thought, have great potential in completing sophisticated high-level trajectories.
arXiv Detail & Related papers (2023-05-24T18:14:35Z)
Learning to Play Text-based Adventure Games with Maximum Entropy Reinforcement Learning [4.698846136465861]
We adapt the soft-actor-critic (SAC) algorithm to the text-based environment. We show that the reward shaping technique helps the agent to learn the policy faster and achieve higher scores.
arXiv Detail & Related papers (2023-02-21T15:16:12Z)
Learning Multi-Objective Curricula for Deep Reinforcement Learning [55.27879754113767]
Various automatic curriculum learning (ACL) methods have been proposed to improve the sample efficiency and final performance of deep reinforcement learning (DRL) In this paper, we propose a unified automatic curriculum learning framework to create multi-objective but coherent curricula. In addition to existing hand-designed curricula paradigms, we further design a flexible memory mechanism to learn an abstract curriculum.
arXiv Detail & Related papers (2021-10-06T19:30:25Z)
Meta Automatic Curriculum Learning [35.13646854355393]
We introduce the concept of Meta-ACL, and formalize it in the context of black-box RL learners. We present AGAIN, a first instantiation of Meta-ACL, and showcase its benefits for curriculum generation over classical ACL.
arXiv Detail & Related papers (2020-11-16T14:56:42Z)
The NetHack Learning Environment [79.06395964379107]
We present the NetHack Learning Environment (NLE), a procedurally generated rogue-like environment for Reinforcement Learning research. We argue that NetHack is sufficiently complex to drive long-term research on problems such as exploration, planning, skill acquisition, and language-conditioned RL. We demonstrate empirical success for early stages of the game using a distributed Deep RL baseline and Random Network Distillation exploration.
arXiv Detail & Related papers (2020-06-24T14:12:56Z)
Model-Based Reinforcement Learning for Atari [89.3039240303797]
We show how video prediction models can enable agents to solve Atari games with fewer interactions than model-free methods. Our experiments evaluate SimPLe on a range of Atari games in low data regime of 100k interactions between the agent and the environment.
arXiv Detail & Related papers (2019-03-01T15:40:19Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.