Related papers: Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration

Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration

URL: http://arxiv.org/abs/2509.20648v2
Date: Tue, 21 Oct 2025 17:58:16 GMT
Title: Wonder Wins Ways: Curiosity-Driven Exploration through Multi-Agent Contextual Calibration
Authors: Yiyuan Pan, Zhe Liu, Hesheng Wang,
Abstract summary: We introduce CERMIC, a principled framework that empowers agents to filter noisy surprise signals and guide exploration by dynamically calibrating their intrinsic curiosity with inferred multi-agent context.<n>We evaluate CERMIC on benchmark suites including VMAS, Meltingpot, and SMACv2.
Score: 21.410144944756308
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Autonomous exploration in complex multi-agent reinforcement learning (MARL) with sparse rewards critically depends on providing agents with effective intrinsic motivation. While artificial curiosity offers a powerful self-supervised signal, it often confuses environmental stochasticity with meaningful novelty. Moreover, existing curiosity mechanisms exhibit a uniform novelty bias, treating all unexpected observations equally. However, peer behavior novelty, which encode latent task dynamics, are often overlooked, resulting in suboptimal exploration in decentralized, communication-free MARL settings. To this end, inspired by how human children adaptively calibrate their own exploratory behaviors via observing peers, we propose a novel approach to enhance multi-agent exploration. We introduce CERMIC, a principled framework that empowers agents to robustly filter noisy surprise signals and guide exploration by dynamically calibrating their intrinsic curiosity with inferred multi-agent context. Additionally, CERMIC generates theoretically-grounded intrinsic rewards, encouraging agents to explore state transitions with high information gain. We evaluate CERMIC on benchmark suites including VMAS, Meltingpot, and SMACv2. Empirical results demonstrate that exploration with CERMIC significantly outperforms SoTA algorithms in sparse-reward environments.

Related papers

Beyond Monolithic Architectures: A Multi-Agent Search and Knowledge Optimization Framework for Agentic Search [56.78490647843876]
Agentic search has emerged as a promising paradigm for complex information seeking by enabling Large Language Models (LLMs) to interleave reasoning with tool use.<n>We propose bfM-ASK, a framework that explicitly decouples agentic search into two complementary roles: Search Behavior Agents, which plan and execute search actions, and Knowledge Management Agents, which aggregate, filter, and maintain a compact internal context.
arXiv Detail & Related papers (2026-01-08T08:13:27Z)
SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning [57.083359974905655]
SenseNova-MARS is a novel Multimodal Agentic Reasoning and Search framework.<n>It dynamically integrates the image search, text search, and image crop tools to tackle knowledge-intensive visual understanding challenges.<n> SenseNova-MARS achieves state-of-the-art performance on open-source search and fine-grained image understanding benchmarks.
arXiv Detail & Related papers (2025-12-30T16:31:45Z)
RE-Searcher: Robust Agentic Search with Goal-oriented Planning and Self-reflection [55.125987985864896]
We present a systematic analysis that quantifies how environmental complexity induces fragile search behaviors.<n>We propose a simple yet effective approach to instantiate a search agent, RE-Searcher.<n>This combination of goal-oriented planning and self-reflection enables RE-Searcher to resist spurious cues in complex search environments.
arXiv Detail & Related papers (2025-09-30T10:25:27Z)
Interpretable Learning Dynamics in Unsupervised Reinforcement Learning [0.10832949790701804]
We present an interpretability framework for unsupervised reinforcement learning (URL) agents.<n>We analyze five agents DQN, RND, ICM, PPO, and a Transformer-RND variant trained on procedurally generated environments.
arXiv Detail & Related papers (2025-05-06T19:57:09Z)
Deep Reinforcement Learning with Hybrid Intrinsic Reward Model [50.53705050673944]
Intrinsic reward shaping has emerged as a prevalent approach to solving hard-exploration and sparse-rewards environments.<n>We introduce HIRE (Hybrid Intrinsic REward), a framework for creating hybrid intrinsic rewards through deliberate fusion strategies.
arXiv Detail & Related papers (2025-01-22T04:22:13Z)
Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL) Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms. It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z)
Curiosity & Entropy Driven Unsupervised RL in Multiple Environments [0.0]
We propose and experiment with five new modifications to the original work. In high-dimensional environments, curiosity-driven exploration enhances learning by encouraging the agent to seek diverse experiences and explore the unknown more. However, its benefits are limited in low-dimensional and simpler environments where exploration possibilities are constrained and there is little that is truly unknown to the agent.
arXiv Detail & Related papers (2024-01-08T19:25:40Z)
DCIR: Dynamic Consistency Intrinsic Reward for Multi-Agent Reinforcement Learning [84.22561239481901]
We propose a new approach that enables agents to learn whether their behaviors should be consistent with that of other agents. We evaluate DCIR in multiple environments including Multi-agent Particle, Google Research Football and StarCraft II Micromanagement.
arXiv Detail & Related papers (2023-12-10T06:03:57Z)
Strangeness-driven Exploration in Multi-Agent Reinforcement Learning [0.0]
We introduce a new exploration method with the strangeness that can be easily incorporated into any centralized training and decentralized execution (CTDE)-based MARL algorithms. The exploration bonus is obtained from the strangeness and the proposed exploration method is not much affected by transitions commonly observed in MARL tasks.
arXiv Detail & Related papers (2022-12-27T11:08:49Z)
Curiosity-Driven Multi-Agent Exploration with Mixed Objectives [7.247148291603988]
Intrinsic rewards have been increasingly used to mitigate the sparse reward problem in single-agent reinforcement learning. Curiosity-driven exploration is a simple yet efficient approach that quantifies this novelty as the prediction error of the agent's curiosity module. We show here, however, that naively using this curiosity-driven approach to guide exploration in sparse reward cooperative multi-agent environments does not consistently lead to improved results.
arXiv Detail & Related papers (2022-10-29T02:45:38Z)
Rewarding Episodic Visitation Discrepancy for Exploration in Reinforcement Learning [64.8463574294237]
We propose Rewarding Episodic Visitation Discrepancy (REVD) as an efficient and quantified exploration method. REVD provides intrinsic rewards by evaluating the R'enyi divergence-based visitation discrepancy between episodes. It is tested on PyBullet Robotics Environments and Atari games.
arXiv Detail & Related papers (2022-09-19T08:42:46Z)
Episodic Multi-agent Reinforcement Learning with Curiosity-Driven Exploration [40.87053312548429]
We introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC. We use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training.
arXiv Detail & Related papers (2021-11-22T07:34:47Z)
Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning [12.76337275628074]
In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality andgenerativeity. We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration. Our method outperforms several state-of-the-art environment model-based exploration approaches.
arXiv Detail & Related papers (2020-10-17T09:54:51Z)
UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning [53.73686229912562]
We propose a novel MARL approach called Universal Value Exploration (UneVEn) UneVEn learns a set of related tasks simultaneously with a linear decomposition of universal successor features. Empirical results on a set of exploration games, challenging cooperative predator-prey tasks requiring significant coordination among agents, and StarCraft II micromanagement benchmarks show that UneVEn can solve tasks where other state-of-the-art MARL methods fail.
arXiv Detail & Related papers (2020-10-06T19:08:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.