PIMAEX: Multi-Agent Exploration through Peer Incentivization
- URL: http://arxiv.org/abs/2501.01266v1
- Date: Thu, 02 Jan 2025 14:06:52 GMT
- Title: PIMAEX: Multi-Agent Exploration through Peer Incentivization
- Authors: Michael Kölle, Johannes Tochtermann, Julian Schönberger, Gerhard Stenzel, Philipp Altmann, Claudia Linnhoff-Popien,
- Abstract summary: This work proposes a peer-incentivized reward function inspired by previous research on intrinsic curiosity and influence-based rewards.<n>The textitPIMAEX reward aims to improve exploration in the multi-agent setting by encouraging agents to exert influence over each other.
- Score: 4.860484063961962
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: While exploration in single-agent reinforcement learning has been studied extensively in recent years, considerably less work has focused on its counterpart in multi-agent reinforcement learning. To address this issue, this work proposes a peer-incentivized reward function inspired by previous research on intrinsic curiosity and influence-based rewards. The \textit{PIMAEX} reward, short for Peer-Incentivized Multi-Agent Exploration, aims to improve exploration in the multi-agent setting by encouraging agents to exert influence over each other to increase the likelihood of encountering novel states. We evaluate the \textit{PIMAEX} reward in conjunction with \textit{PIMAEX-Communication}, a multi-agent training algorithm that employs a communication channel for agents to influence one another. The evaluation is conducted in the \textit{Consume/Explore} environment, a partially observable environment with deceptive rewards, specifically designed to challenge the exploration vs.\ exploitation dilemma and the credit-assignment problem. The results empirically demonstrate that agents using the \textit{PIMAEX} reward with \textit{PIMAEX-Communication} outperform those that do not.
Related papers
- Credit Assignment and Efficient Exploration based on Influence Scope in Multi-agent Reinforcement Learning [2.8111817372725785]
Training cooperative agents in sparse-reward scenarios poses significant challenges for multi-agent reinforcement learning (MARL)<n>We propose an algorithm that calculates the Influence Scope of Agents (ISA) on states by taking specific value of the dimensions/attributes of states that can be influenced by individual agents.<n>The mutual dependence between agents' actions and state attributes are then used to calculate the credit assignment and to delimit the exploration space for each individual agent.
arXiv Detail & Related papers (2025-05-13T14:49:26Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Fast Peer Adaptation with Context-aware Exploration [63.08444527039578]
We propose a peer identification reward for learning agents in multi-agent games.
This reward motivates the agent to learn a context-aware policy for effective exploration and fast adaptation.
We evaluate our method on diverse testbeds that involve competitive (Kuhn Poker), cooperative (PO-Overcooked), or mixed (Predator-Prey-W) games with peer agents.
arXiv Detail & Related papers (2024-02-04T13:02:27Z) - Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing [34.299478481229265]
We propose MACE, a simple yet effective multi-agent coordinated exploration method.
By communicating only local novelty, agents can take into account other agents' local novelty to approximate the global novelty.
We show that MACE achieves superior performance in three multi-agent environments with sparse rewards.
arXiv Detail & Related papers (2024-02-03T09:35:25Z) - MA2CL:Masked Attentive Contrastive Learning for Multi-Agent
Reinforcement Learning [128.19212716007794]
We propose an effective framework called textbfMulti-textbfAgent textbfMasked textbfAttentive textbfContrastive textbfLearning (MA2CL)
MA2CL encourages learning representation to be both temporal and agent-level predictive by reconstructing the masked agent observation in latent space.
Our method significantly improves the performance and sample efficiency of different MARL algorithms and outperforms other methods in various vision-based and state-based scenarios.
arXiv Detail & Related papers (2023-06-03T05:32:19Z) - Successor-Predecessor Intrinsic Exploration [18.440869985362998]
We focus on exploration with intrinsic rewards, where the agent transiently augments the external rewards with self-generated intrinsic rewards.
We propose Successor-Predecessor Intrinsic Exploration (SPIE), an exploration algorithm based on a novel intrinsic reward combining prospective and retrospective information.
We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods.
arXiv Detail & Related papers (2023-05-24T16:02:51Z) - Credit-cognisant reinforcement learning for multi-agent cooperation [0.0]
We introduce the concept of credit-cognisant rewards, which allows an agent to perceive the effect its actions had on the environment as well as on its co-agents.
We show that by manipulating these experiences and constructing the reward contained within them to include the rewards received by all the agents within the same action sequence, we are able to improve significantly on the performance of independent deep Q-learning.
arXiv Detail & Related papers (2022-11-18T09:00:25Z) - Curiosity-Driven Multi-Agent Exploration with Mixed Objectives [7.247148291603988]
Intrinsic rewards have been increasingly used to mitigate the sparse reward problem in single-agent reinforcement learning.
Curiosity-driven exploration is a simple yet efficient approach that quantifies this novelty as the prediction error of the agent's curiosity module.
We show here, however, that naively using this curiosity-driven approach to guide exploration in sparse reward cooperative multi-agent environments does not consistently lead to improved results.
arXiv Detail & Related papers (2022-10-29T02:45:38Z) - Episodic Multi-agent Reinforcement Learning with Curiosity-Driven
Exploration [40.87053312548429]
We introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC.
We use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training.
arXiv Detail & Related papers (2021-11-22T07:34:47Z) - Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning.
The goal is selected from multiple projected state spaces via a normalized entropy-based technique.
We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z) - Explore and Control with Adversarial Surprise [78.41972292110967]
Reinforcement learning (RL) provides a framework for learning goal-directed policies given user-specified rewards.
We propose a new unsupervised RL technique based on an adversarial game which pits two policies against each other to compete over the amount of surprise an RL agent experiences.
We show that our method leads to the emergence of complex skills by exhibiting clear phase transitions.
arXiv Detail & Related papers (2021-07-12T17:58:40Z) - Exploration and Incentives in Reinforcement Learning [107.42240386544633]
We consider complex exploration problems, where each agent faces the same (but unknown) MDP.
Agents control the choice of policies, whereas an algorithm can only issue recommendations.
We design an algorithm which explores all reachable states in the MDP.
arXiv Detail & Related papers (2021-02-28T00:15:53Z) - UneVEn: Universal Value Exploration for Multi-Agent Reinforcement
Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn)
UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features.
Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z) - Learning to Incentivize Other Learning Agents [73.03133692589532]
We show how to equip RL agents with the ability to give rewards directly to other agents, using a learned incentive function.
Such agents significantly outperform standard RL and opponent-shaping agents in challenging general-sum Markov games.
Our work points toward more opportunities and challenges along the path to ensure the common good in a multi-agent future.
arXiv Detail & Related papers (2020-06-10T20:12:38Z) - Maximizing Information Gain in Partially Observable Environments via
Prediction Reward [64.24528565312463]
This paper tackles the challenge of using belief-based rewards for a deep RL agent.
We derive the exact error between negative entropy and the expected prediction reward.
This insight provides theoretical motivation for several fields using prediction rewards.
arXiv Detail & Related papers (2020-05-11T08:13:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.