Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent
Reinforcement Learning
- URL: http://arxiv.org/abs/2303.09032v2
- Date: Fri, 14 Jul 2023 02:29:19 GMT
- Title: Conditionally Optimistic Exploration for Cooperative Deep Multi-Agent
Reinforcement Learning
- Authors: Xutong Zhao, Yangchen Pan, Chenjun Xiao, Sarath Chandar, Janarthanan
Rajendran
- Abstract summary: Efficient exploration is critical in cooperative deep Multi-Agent Reinforcement Learning (MARL)
In this work, we propose an exploration method that effectively encourages cooperative exploration based on the idea of sequential action-computation.
- Score: 24.05715475457959
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Efficient exploration is critical in cooperative deep Multi-Agent
Reinforcement Learning (MARL). In this work, we propose an exploration method
that effectively encourages cooperative exploration based on the idea of
sequential action-computation scheme. The high-level intuition is that to
perform optimism-based exploration, agents would explore cooperative strategies
if each agent's optimism estimate captures a structured dependency relationship
with other agents. Assuming agents compute actions following a sequential order
at \textit{each environment timestep}, we provide a perspective to view MARL as
tree search iterations by considering agents as nodes at different depths of
the search tree. Inspired by the theoretically justified tree search algorithm
UCT (Upper Confidence bounds applied to Trees), we develop a method called
Conditionally Optimistic Exploration (COE). COE augments each agent's
state-action value estimate with an action-conditioned optimistic bonus derived
from the visitation count of the global state and joint actions of preceding
agents. COE is performed during training and disabled at deployment, making it
compatible with any value decomposition method for centralized training with
decentralized execution. Experiments across various cooperative MARL benchmarks
show that COE outperforms current state-of-the-art exploration methods on
hard-exploration tasks.
Related papers
- Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - Efficient Reinforcement Learning via Decoupling Exploration and Utilization [6.305976803910899]
Reinforcement Learning (RL) has achieved remarkable success across multiple fields and applications, including gaming, robotics, and autonomous vehicles.
In this work, our aim is to train agent with efficient learning by decoupling exploration and utilization, so that agent can escaping the conundrum of suboptimal Solutions.
The above idea is implemented in the proposed OPARL (Optimistic and Pessimistic Actor Reinforcement Learning) algorithm.
arXiv Detail & Related papers (2023-12-26T09:03:23Z) - Learning Reward Machines in Cooperative Multi-Agent Tasks [75.79805204646428]
This paper presents a novel approach to Multi-Agent Reinforcement Learning (MARL)
It combines cooperative task decomposition with the learning of reward machines (RMs) encoding the structure of the sub-tasks.
The proposed method helps deal with the non-Markovian nature of the rewards in partially observable environments.
arXiv Detail & Related papers (2023-03-24T15:12:28Z) - Strangeness-driven Exploration in Multi-Agent Reinforcement Learning [0.0]
We introduce a new exploration method with the strangeness that can be easily incorporated into any centralized training and decentralized execution (CTDE)-based MARL algorithms.
The exploration bonus is obtained from the strangeness and the proposed exploration method is not much affected by transitions commonly observed in MARL tasks.
arXiv Detail & Related papers (2022-12-27T11:08:49Z) - Multi-agent Deep Covering Skill Discovery [50.812414209206054]
We propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space.
Also, we propose a novel framework to adopt the multi-agent options in the MARL process.
We show that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options.
arXiv Detail & Related papers (2022-10-07T00:40:59Z) - MAVIPER: Learning Decision Tree Policies for Interpretable Multi-Agent
Reinforcement Learning [38.77840067555711]
We propose the first set of interpretable MARL algorithms that extract decision-tree policies from neural networks trained with MARL.
The first algorithm, IVIPER, extends VIPER, a recent method for single-agent interpretable RL, to the multi-agent setting.
To better capture coordination between agents, we propose a novel centralized decision-tree training algorithm, MAVIPER.
arXiv Detail & Related papers (2022-05-25T02:38:10Z) - Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning.
The goal is selected from multiple projected state spaces via a normalized entropy-based technique.
We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z) - Modeling the Interaction between Agents in Cooperative Multi-Agent
Reinforcement Learning [2.9360071145551068]
We propose a novel cooperative MARL algorithm named as interactive actor-critic(IAC)
IAC models the interaction of agents from perspectives of policy and value function.
We extend the value decomposition methods to continuous control tasks and evaluate IAC on benchmark tasks including classic control and multi-agent particle environments.
arXiv Detail & Related papers (2021-02-10T01:58:28Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.