Towards Principled Multi-Agent Task Agnostic Exploration
- URL: http://arxiv.org/abs/2502.08365v1
- Date: Wed, 12 Feb 2025 12:51:36 GMT
- Title: Towards Principled Multi-Agent Task Agnostic Exploration
- Authors: Riccardo Zamboni, Mirco Mutti, Marcello Restelli,
- Abstract summary: In reinforcement learning, we typically refer to task-agnostic exploration when we aim to explore the environment without access to the task specification a priori.
In this paper, we address the question through a generalization to multiple agents of the problem of maximizing the state distribution entropy.
We present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings.
- Score: 44.601019677298005
- License:
- Abstract: In reinforcement learning, we typically refer to task-agnostic exploration when we aim to explore the environment without access to the task specification a priori. In a single-agent setting the problem has been extensively studied and mostly understood. A popular approach cast the task-agnostic objective as maximizing the entropy of the state distribution induced by the agent's policy, from which principles and methods follows. In contrast, little is known about task-agnostic exploration in multi-agent settings, which are ubiquitous in the real world. How should different agents explore in the presence of others? In this paper, we address this question through a generalization to multiple agents of the problem of maximizing the state distribution entropy. First, we investigate alternative formulations, highlighting respective positives and negatives. Then, we present a scalable, decentralized, trust-region policy search algorithm to address the problem in practical settings. Finally, we provide proof of concept experiments to both corroborate the theoretical findings and pave the way for task-agnostic exploration in challenging multi-agent settings.
Related papers
- Deep Multi-Agent Reinforcement Learning for Decentralized Active
Hypothesis Testing [11.639503711252663]
We tackle the multi-agent active hypothesis testing (AHT) problem by introducing a novel algorithm rooted in the framework of deep multi-agent reinforcement learning.
We present a comprehensive set of experimental results that effectively showcase the agents' ability to learn collaborative strategies and enhance performance.
arXiv Detail & Related papers (2023-09-14T01:18:04Z) - On the Importance of Exploration for Generalization in Reinforcement
Learning [89.63074327328765]
We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty.
Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
arXiv Detail & Related papers (2023-06-08T18:07:02Z) - Emergence of Novelty in Evolutionary Algorithms [0.0]
We introduce our approach to the maze problem and compare it to the previously proposed solution.
We find that our solution leads to an improved performance while being significantly simpler.
Building on that, we generalize the problem and apply our approach to a more advanced set of tasks, Atari Games.
arXiv Detail & Related papers (2022-06-27T13:49:41Z) - Q-Mixing Network for Multi-Agent Pathfinding in Partially Observable
Grid Environments [62.997667081978825]
We consider the problem of multi-agent navigation in partially observable grid environments.
We suggest utilizing the reinforcement learning approach when the agents, first, learn the policies that map observations to actions and then follow these policies to reach their goals.
arXiv Detail & Related papers (2021-08-13T09:44:47Z) - Cooperative Exploration for Multi-Agent Deep Reinforcement Learning [127.4746863307944]
We propose cooperative multi-agent exploration (CMAE) for deep reinforcement learning.
The goal is selected from multiple projected state spaces via a normalized entropy-based technique.
We demonstrate that CMAE consistently outperforms baselines on various tasks.
arXiv Detail & Related papers (2021-07-23T20:06:32Z) - Discrete-Time Mean Field Control with Environment States [25.44061731738579]
Mean field control and mean field games have been established as a tractable solution for large-scale multi-agent problems with many agents.
We rigorously establish approximate optimality as the number of agents grows in the finite agent case.
We find that a dynamic programming principle holds, resulting in the existence of an optimal stationary policy.
arXiv Detail & Related papers (2021-04-30T10:58:01Z) - Discovery of Options via Meta-Learned Subgoals [59.2160583043938]
Temporal abstractions in the form of options have been shown to help reinforcement learning (RL) agents learn faster.
We introduce a novel meta-gradient approach for discovering useful options in multi-task RL environments.
arXiv Detail & Related papers (2021-02-12T19:50:40Z) - Task-Optimal Exploration in Linear Dynamical Systems [29.552894877883883]
We study task-guided exploration and determine what precisely an agent must learn about their environment in order to complete a task.
We provide instance- and task-dependent lower bounds which explicitly quantify the difficulty of completing a task of interest.
We show that it optimally explores the environment, collecting precisely the information needed to complete the task, and provide finite-time bounds guaranteeing that it achieves the instance- and task-optimal sample complexity.
arXiv Detail & Related papers (2021-02-10T01:42:22Z) - Geometric Entropic Exploration [52.67987687712534]
We introduce a new algorithm that maximises the geometry-aware Shannon entropy of state-visits in both discrete and continuous domains.
Our key theoretical contribution is casting geometry-aware MSVE exploration as a tractable problem of optimising a simple and novel noise-contrastive objective function.
In our experiments, we show the efficiency of GEM in solving several RL problems with sparse rewards, compared against other deep RL exploration approaches.
arXiv Detail & Related papers (2021-01-06T14:15:07Z) - A Policy Gradient Algorithm for Learning to Learn in Multiagent
Reinforcement Learning [47.154539984501895]
We propose a novel meta-multiagent policy gradient theorem that accounts for the non-stationary policy dynamics inherent to multiagent learning settings.
This is achieved by modeling our gradient updates to consider both an agent's own non-stationary policy dynamics and the non-stationary policy dynamics of other agents in the environment.
arXiv Detail & Related papers (2020-10-31T22:50:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.