Online Feedback Efficient Active Target Discovery in Partially Observable Environments
- URL: http://arxiv.org/abs/2505.06535v2
- Date: Sun, 19 Oct 2025 01:44:09 GMT
- Title: Online Feedback Efficient Active Target Discovery in Partially Observable Environments
- Authors: Anindya Sarkar, Binglin Ji, Yevgeniy Vorobeychik,
- Abstract summary: Diffusion-guided Active Target Discovery (DiffATD) is a novel method that leverages diffusion dynamics for active target discovery.<n>DiffATD enables efficient target discovery in a partially observable environment within a fixed sampling budget.<n>We show that DiffATD performs significantly better than baselines and competitively with supervised methods that operate under full environmental observability.
- Score: 26.488250231429774
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: In various scientific and engineering domains, where data acquisition is costly--such as in medical imaging, environmental monitoring, or remote sensing--strategic sampling from unobserved regions, guided by prior observations, is essential to maximize target discovery within a limited sampling budget. In this work, we introduce Diffusion-guided Active Target Discovery (DiffATD), a novel method that leverages diffusion dynamics for active target discovery. DiffATD maintains a belief distribution over each unobserved state in the environment, using this distribution to dynamically balance exploration-exploitation. Exploration reduces uncertainty by sampling regions with the highest expected entropy, while exploitation targets areas with the highest likelihood of discovering the target, indicated by the belief distribution and an incrementally trained reward model designed to learn the characteristics of the target. DiffATD enables efficient target discovery in a partially observable environment within a fixed sampling budget, all without relying on any prior supervised training. Furthermore, DiffATD offers interpretability, unlike existing black--box policies that require extensive supervised training. Through extensive experiments and ablation studies across diverse domains, including medical imaging, species discovery, and remote sensing, we show that DiffATD performs significantly better than baselines and competitively with supervised methods that operate under full environmental observability.
Related papers
- Active Target Discovery under Uninformative Prior: The Power of Permanent and Transient Memory [26.488250231429774]
In many scientific and engineering fields, where acquiring high-quality data is expensive, strategic sampling of unobserved regions is crucial for maximizing discovery rates within a constrained budget.<n>We propose a novel approach that enables effective active target discovery even in settings with uninformative priors.<n>Unlike black-box policies, our approach is inherently interpretable, providing clear insights into decision-making.
arXiv Detail & Related papers (2025-10-19T00:42:56Z) - Diversity-Incentivized Exploration for Versatile Reasoning [63.653348177250756]
We propose textbfDIVER (textbfDi-textbfIncentivized Exploration for textbfVersatiltextbfE textbfReasoning), an innovative framework that highlights the pivotal role of global sequence-level diversity to incentivize deep exploration for versatile reasoning.
arXiv Detail & Related papers (2025-09-30T13:11:46Z) - Goal Discovery with Causal Capacity for Efficient Reinforcement Learning [85.28685202281918]
Causal inference is crucial for humans to explore the world.<n>We propose a novel Goal Discovery with Causal Capacity framework for efficient environment exploration.
arXiv Detail & Related papers (2025-08-13T08:54:56Z) - On Efficient Bayesian Exploration in Model-Based Reinforcement Learning [0.24578723416255752]
We address the challenge of data-efficient exploration in reinforcement learning by examining existing principled, information-theoretic approaches to intrinsic motivation.<n>We prove that exploration bonuses naturally signal epistemic information gains and converge to zero once the agent becomes sufficiently certain about the environment's dynamics and rewards.<n>We then outline a general framework - Predictive Trajectory Sampling with Bayesian Exploration (PTS-BE) - which integrates model-based planning with information-theoretic bonuses to achieve sample-efficient deep exploration.
arXiv Detail & Related papers (2025-07-03T14:03:47Z) - Consistent World Models via Foresight Diffusion [56.45012929930605]
We argue that a key bottleneck in learning consistent diffusion-based world models lies in the suboptimal predictive ability.<n>We propose Foresight Diffusion (ForeDiff), a diffusion-based world modeling framework that enhances consistency by decoupling condition understanding from target denoising.
arXiv Detail & Related papers (2025-05-22T10:01:59Z) - Exploration by Random Distribution Distillation [28.675586715243437]
We propose a novel method called textbfRandom textbfDistribution textbfDistillation (RDD)<n>RDD samples the output of a target network from a normal distribution.<n>We demonstrate that RDD effectively unifies both count-based and prediction-error approaches.
arXiv Detail & Related papers (2025-05-16T09:38:21Z) - Exploratory Diffusion Model for Unsupervised Reinforcement Learning [28.413426177336703]
Unsupervised reinforcement learning (URL) aims to pre-train agents by exploring diverse states or skills in reward-free environments.<n>Existing methods design intrinsic rewards to model the explored data and encourage further exploration.<n>We propose the Exploratory Diffusion Model (ExDM), which leverages the strong expressive ability of diffusion models to fit the explored data.
arXiv Detail & Related papers (2025-02-11T05:48:51Z) - Action abstractions for amortized sampling [49.384037138511246]
We propose an approach to incorporate the discovery of action abstractions, or high-level actions, into the policy optimization process.
Our approach involves iteratively extracting action subsequences commonly used across many high-reward trajectories and chunking' them into a single action that is added to the action space.
arXiv Detail & Related papers (2024-10-19T19:22:50Z) - Random Latent Exploration for Deep Reinforcement Learning [71.88709402926415]
We introduce Random Latent Exploration (RLE), a simple yet effective exploration strategy in reinforcement learning (RL)<n>On average, RLE outperforms noise-based methods, which perturb the agent's actions, and bonus-based exploration, which rewards the agent for attempting novel behaviors.<n>RLE is as simple as noise-based methods, as it avoids complex bonus calculations but retains the deep exploration benefits of bonus-based methods.
arXiv Detail & Related papers (2024-07-18T17:55:22Z) - Variable-Agnostic Causal Exploration for Reinforcement Learning [56.52768265734155]
We introduce a novel framework, Variable-Agnostic Causal Exploration for Reinforcement Learning (VACERL)
Our approach automatically identifies crucial observation-action steps associated with key variables using attention mechanisms.
It constructs the causal graph connecting these steps, which guides the agent towards observation-action pairs with greater causal influence on task completion.
arXiv Detail & Related papers (2024-07-17T09:45:27Z) - Curiosity & Entropy Driven Unsupervised RL in Multiple Environments [0.0]
We propose and experiment with five new modifications to the original work.
In high-dimensional environments, curiosity-driven exploration enhances learning by encouraging the agent to seek diverse experiences and explore the unknown more.
However, its benefits are limited in low-dimensional and simpler environments where exploration possibilities are constrained and there is little that is truly unknown to the agent.
arXiv Detail & Related papers (2024-01-08T19:25:40Z) - Diffusion Models for Multi-target Adversarial Tracking [0.49157446832511503]
Target tracking plays a crucial role in real-world scenarios, particularly in drug-trafficking interdiction.
As unmanned drones proliferate, accurate autonomous target estimation is even more crucial for security and safety.
This paper presents Constrained Agent-based Diffusion for Enhanced Multi-Agent Tracking (CADENCE), an approach aimed at generating comprehensive predictions of adversary locations.
arXiv Detail & Related papers (2023-07-12T15:34:39Z) - Reinforcement Learning for Agile Active Target Sensing with a UAV [10.070339628481445]
This paper develops a deep reinforcement learning approach to plan informative trajectories.
It exploits its current belief of the target states and incorporates inaccurate sensor models for high-fidelity classification.
A unique characteristic of our approach is that it is robust to varying amounts of deviations from the true target distribution.
arXiv Detail & Related papers (2022-12-16T01:01:17Z) - Active Inference and Reinforcement Learning: A unified inference on continuous state and action spaces under partial observability [19.56438470022024]
Many real-world problems involve partial observations, formulated as partially observable decision processes (POMDPs)
Previous studies have tackled RL in POMDPs by either incorporating the memory of past actions and observations or by inferring the true state of the environment.
We propose a unified principle that establishes a theoretical connection between Active inference (AIF) andReinforcement learning (RL)
Experimental results demonstrate the superior learning capabilities of our method in solving continuous space partially observable tasks.
arXiv Detail & Related papers (2022-12-15T16:28:06Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Information is Power: Intrinsic Control via Information Capture [110.3143711650806]
We argue that a compact and general learning objective is to minimize the entropy of the agent's state visitation estimated using a latent state-space model.
This objective induces an agent to both gather information about its environment, corresponding to reducing uncertainty, and to gain control over its environment, corresponding to reducing the unpredictability of future world states.
arXiv Detail & Related papers (2021-12-07T18:50:42Z) - Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via
Latent Model Ensembles [73.15950858151594]
This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards.
We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling.
We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives.
arXiv Detail & Related papers (2020-10-27T22:06:57Z) - Variational Dynamic for Self-Supervised Exploration in Deep Reinforcement Learning [12.76337275628074]
In this work, we propose a variational dynamic model based on the conditional variational inference to model the multimodality andgenerativeity.
We derive an upper bound of the negative log-likelihood of the environmental transition and use such an upper bound as the intrinsic reward for exploration.
Our method outperforms several state-of-the-art environment model-based exploration approaches.
arXiv Detail & Related papers (2020-10-17T09:54:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.