MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure
- URL: http://arxiv.org/abs/2405.00902v1
- Date: Wed, 1 May 2024 23:19:48 GMT
- Title: MESA: Cooperative Meta-Exploration in Multi-Agent Learning through Exploiting State-Action Space Structure
- Authors: Zhicheng Zhang, Yancheng Liang, Yi Wu, Fei Fang,
- Abstract summary: This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning.
It learns to explore by first identifying the agents' high-rewarding joint state-action subspace from training tasks and then learning a set of diverse exploration policies to "cover" the subspace.
Experiments show that with learned exploration policies, MESA achieves significantly better performance in sparse-reward tasks in several multi-agent particle environments and multi-agent MuJoCo environments.
- Score: 37.56309011441144
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Multi-agent reinforcement learning (MARL) algorithms often struggle to find strategies close to Pareto optimal Nash Equilibrium, owing largely to the lack of efficient exploration. The problem is exacerbated in sparse-reward settings, caused by the larger variance exhibited in policy learning. This paper introduces MESA, a novel meta-exploration method for cooperative multi-agent learning. It learns to explore by first identifying the agents' high-rewarding joint state-action subspace from training tasks and then learning a set of diverse exploration policies to "cover" the subspace. These trained exploration policies can be integrated with any off-policy MARL algorithm for test-time tasks. We first showcase MESA's advantage in a multi-step matrix game. Furthermore, experiments show that with learned exploration policies, MESA achieves significantly better performance in sparse-reward tasks in several multi-agent particle environments and multi-agent MuJoCo environments, and exhibits the ability to generalize to more challenging tasks at test time.
Related papers
- Sample Efficient Myopic Exploration Through Multitask Reinforcement
Learning with Diverse Tasks [53.44714413181162]
This paper shows that when an agent is trained on a sufficiently diverse set of tasks, a generic policy-sharing algorithm with myopic exploration design can be sample-efficient.
To the best of our knowledge, this is the first theoretical demonstration of the "exploration benefits" of MTRL.
arXiv Detail & Related papers (2024-03-03T22:57:44Z) - FoX: Formation-aware exploration in multi-agent reinforcement learning [10.554220876480297]
We propose a formation-based equivalence relation on the exploration space and aim to reduce the search space by exploring only meaningful states in different formations.
Numerical results show that the proposed FoX framework significantly outperforms the state-of-the-art MARL algorithms on Google Research Football (GRF) and sparse Starcraft II multi-agent challenge (SMAC) tasks.
arXiv Detail & Related papers (2023-08-22T08:39:44Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Self-Motivated Multi-Agent Exploration [38.55811936029999]
In cooperative multi-agent reinforcement learning (CMARL), it is critical for agents to achieve a balance between self-exploration and team collaboration.
Recent works mainly concentrate on agents' coordinated exploration, which brings about the exponentially grown exploration of the state space.
We propose Self-Motivated Multi-Agent Exploration (SMMAE), which aims to achieve success in team tasks by adaptively finding a trade-off between self-exploration and team cooperation.
arXiv Detail & Related papers (2023-01-05T14:42:39Z) - RPM: Generalizable Behaviors for Multi-Agent Reinforcement Learning [90.43925357575543]
We propose ranked policy memory ( RPM) to collect diverse multi-agent trajectories for training MARL policies with good generalizability.
RPM enables MARL agents to interact with unseen agents in multi-agent generalization evaluation scenarios and complete given tasks, and it significantly boosts the performance up to 402% on average.
arXiv Detail & Related papers (2022-10-18T07:32:43Z) - A further exploration of deep Multi-Agent Reinforcement Learning with
Hybrid Action Space [0.0]
We propose two algorithms: deep multi-agent hybrid soft actor-critic (MAHSAC) and multi-agent hybrid deep deterministic policy gradients (MAHDDPG)
Our experiences are running on multi-agent particle environment which is an easy multi-agent particle world, along with some basic simulated physics.
arXiv Detail & Related papers (2022-08-30T07:40:15Z) - Deep Multi-Agent Reinforcement Learning with Hybrid Action Spaces based
on Maximum Entropy [0.0]
We propose Deep Multi-Agent Hybrid Soft Actor-Critic (MAHSAC) to handle multi-agent problems with hybrid action spaces.
This algorithm follows the centralized training but decentralized execution (CTDE) paradigm, and extend the Soft Actor-Critic algorithm (SAC) to handle hybrid action space problems.
Our experiences are running on an easy multi-agent particle world with a continuous observation and discrete action space, along with some basic simulated physics.
arXiv Detail & Related papers (2022-06-10T13:52:59Z) - Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning.
The goal is selected from multiple projected state spaces via a normalized entropy-based technique.
We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.