Hybrid Belief Reinforcement Learning for Efficient Coordinated Spatial Exploration
- URL: http://arxiv.org/abs/2603.03595v1
- Date: Wed, 04 Mar 2026 00:00:34 GMT
- Title: Hybrid Belief Reinforcement Learning for Efficient Coordinated Spatial Exploration
- Authors: Danish Rizvi, David Boyle,
- Abstract summary: Pure model-based approaches provide structured uncertainty estimates but lack adaptive policy learning.<n>This paper presents a hybrid belief-reinforcement learning framework to address this gap.<n>Results show 10.8% higher cumulative reward and 38% faster convergence over baselines.
- Score: 3.0222726254970174
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Coordinating multiple autonomous agents to explore and serve spatially heterogeneous demand requires jointly learning unknown spatial patterns and planning trajectories that maximize task performance. Pure model-based approaches provide structured uncertainty estimates but lack adaptive policy learning, while deep reinforcement learning often suffers from poor sample efficiency when spatial priors are absent. This paper presents a hybrid belief-reinforcement learning (HBRL) framework to address this gap. In the first phase, agents construct spatial beliefs using a Log-Gaussian Cox Process (LGCP) and execute information-driven trajectories guided by a Pathwise Mutual Information (PathMI) planner with multi-step lookahead. In the second phase, trajectory control is transferred to a Soft Actor-Critic (SAC) agent, warm-started through dual-channel knowledge transfer: belief state initialization supplies spatial uncertainty, and replay buffer seeding provides demonstration trajectories generated during LGCP exploration. A variance-normalized overlap penalty enables coordinated coverage through shared belief state, permitting cooperative sensing in high-uncertainty regions while discouraging redundant coverage in well-explored areas. The framework is evaluated on a multi-UAV wireless service provisioning task. Results show 10.8% higher cumulative reward and 38% faster convergence over baselines, with ablation studies confirming that dual-channel transfer outperforms either channel alone.
Related papers
- SENTINEL: Stagewise Integrity Verification for Pipeline Parallel Decentralized Training [54.8494905524997]
Decentralized training introduces critical security risks when executed across untrusted, geographically distributed nodes.<n>We propose SENTINEL, a verification mechanism for pipeline parallelism (PP) training without duplication.<n>Experiments demonstrate successful training of up to 4B- parameter LLMs across untrusted distributed environments with up to 176 workers while maintaining model convergence and performance.
arXiv Detail & Related papers (2026-03-03T23:51:10Z) - TopoCurate:Modeling Interaction Topology for Tool-Use Agent Training [53.93696896939915]
Training tool-use agents typically rely on Supervised Fine-Tuning (SFT) on successful trajectories and Reinforcement Learning (RL) on pass-rate-selected tasks.<n>We propose TopoCurate, an interaction-aware framework that projects multi-trial rollouts from the same task into a unified semantic quotient topology.<n>TopoCurate achieves consistent gains of 4.2% (SFT) and 6.9% (RL) over state-of-the-art baselines.
arXiv Detail & Related papers (2026-03-02T10:38:54Z) - Controllable Exploration in Hybrid-Policy RLVR for Multi-Modal Reasoning [88.42566960813438]
CalibRL is a hybrid-policy RLVR framework that supports controllable exploration with expert guidance.<n>CalibRL increases policy entropy in a guided manner and clarifies the target distribution.<n>Experiments across eight benchmarks, including both in-domain and out-of-domain settings, demonstrate consistent improvements.
arXiv Detail & Related papers (2026-02-22T07:23:36Z) - Search-R2: Enhancing Search-Integrated Reasoning via Actor-Refiner Collaboration [49.9937230730202]
We propose Search-R2, a novel Actor-Refiner collaboration framework that enhances reasoning through targeted intervention.<n>Our approach decomposes the generation process into an Actor, which produces initial reasoning trajectories.<n>We show that Search-R2 consistently outperforms strong RAG and RL-based baselines across model scales.
arXiv Detail & Related papers (2026-02-03T15:32:09Z) - Toward Stable Semi-Supervised Remote Sensing Segmentation via Co-Guidance and Co-Fusion [31.189038928192648]
Co2S is a semi-supervised RS segmentation framework that fuses priors from vision-language models and self-supervised models.<n>An explicit-implicit semantic co-guidance mechanism is introduced that utilizes text embeddings and learnable queries.<n>Experiments on six popular datasets demonstrate the superiority of the proposed method.
arXiv Detail & Related papers (2025-12-28T18:24:19Z) - Distributed Area Coverage with High Altitude Balloons Using Multi-Agent Reinforcement Learning [0.0]
High Altitude Balloons (HABs) can leverage stratospheric wind layers for limited horizontal control, enabling applications in reconnaissance, environmental monitoring, and communications networks.<n>Existing multi-agent HAB coordination approaches use deterministic methods like Voron partitioningoi and extremum seeking control for large global constellations.<n>This work presents the first systematic application of multi-agent reinforcement learning (MARL) to HAB coordination for distributed area coverage.
arXiv Detail & Related papers (2025-10-04T14:39:45Z) - Stochastic Path Planning in Correlated Obstacle Fields [1.8184089804625951]
We introduce the Correlated Obstacle Scene (SCOS) problem, a navigation setting with spatially correlated obstacles of uncertain status.<n>We develop Bayesian belief updates that refine blockage probabilities, and use the posteriors to reduce search space for efficiency.<n>This framework addresses navigation challenges in environments with adversarial interruptions or clustered natural hazards.
arXiv Detail & Related papers (2025-09-23T20:30:35Z) - Sharp Gaussian approximations for Decentralized Federated Learning [1.2489632787815885]
Federated Learning has gained traction in privacy-sensitive collaborative environments, with local SGD emerging as a key optimization method.<n>We present two generalized Gaussian approximation results for local SGD and explore their implications.<n>The time-uniform approximations support bootstrap-based tests for detecting adversarial attacks.
arXiv Detail & Related papers (2025-05-12T23:40:13Z) - Semi-supervised Domain Adaptive Structure Learning [72.01544419893628]
Semi-supervised domain adaptation (SSDA) is a challenging problem requiring methods to overcome both 1) overfitting towards poorly annotated data and 2) distribution shift across domains.
We introduce an adaptive structure learning method to regularize the cooperation of SSL and DA.
arXiv Detail & Related papers (2021-12-12T06:11:16Z) - Unpaired Adversarial Learning for Single Image Deraining with Rain-Space
Contrastive Constraints [61.40893559933964]
We develop an effective unpaired SID method which explores mutual properties of the unpaired exemplars by a contrastive learning manner in a GAN framework, named as CDR-GAN.
Our method performs favorably against existing unpaired deraining approaches on both synthetic and real-world datasets, even outperforms several fully-supervised or semi-supervised models.
arXiv Detail & Related papers (2021-09-07T10:00:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.