Surveillance Evasion Through Bayesian Reinforcement Learning
- URL: http://arxiv.org/abs/2109.14811v1
- Date: Thu, 30 Sep 2021 02:29:21 GMT
- Title: Surveillance Evasion Through Bayesian Reinforcement Learning
- Authors: Dongping Qi, David Bindel, Alexander Vladimirsky
- Abstract summary: We consider a 2D continuous path planning problem with a completely unknown intensity of random termination.
Those Observers' surveillance intensity is a priori unknown and has to be learned through repetitive path planning.
- Score: 78.79938727251594
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a 2D continuous path planning problem with a completely unknown
intensity of random termination: an Evader is trying to escape a domain while
minimizing the cumulative risk of detection (termination) by adversarial
Observers. Those Observers' surveillance intensity is a priori unknown and has
to be learned through repetitive path planning. We propose a new algorithm that
utilizes Gaussian process regression to model the unknown surveillance
intensity and relies on a confidence bound technique to promote strategic
exploration. We illustrate our method through several examples and confirm the
convergence of averaged regret experimentally.
Related papers
- Distributionally Robust Inverse Reinforcement Learning for Identifying Multi-Agent Coordinated Sensing [13.440621354486906]
We derive a minimax distributionally robust inverse reinforcement learning (IRL) algorithm to reconstruct the utility functions of a multi-agent sensing system.
We prove the equivalence between this robust estimation and a semi-infinite optimization reformulation, and we propose a consistent algorithm to compute solutions.
arXiv Detail & Related papers (2024-09-22T17:44:32Z) - Provably Efficient Partially Observable Risk-Sensitive Reinforcement
Learning with Hindsight Observation [35.278669159850146]
We introduce a novel formulation that integrates hindsight observations into a Partially Observable Decision Process (POMDP) framework.
We develop the first provably efficient RL algorithm tailored for this setting.
These techniques are of particular interest to the theoretical study of reinforcement learning.
arXiv Detail & Related papers (2024-02-28T08:24:06Z) - Safeguarded Progress in Reinforcement Learning: Safe Bayesian
Exploration for Control Policy Synthesis [63.532413807686524]
This paper addresses the problem of maintaining safety during training in Reinforcement Learning (RL)
We propose a new architecture that handles the trade-off between efficient progress and safety during exploration.
arXiv Detail & Related papers (2023-12-18T16:09:43Z) - Contrastive Pseudo Learning for Open-World DeepFake Attribution [67.58954345538547]
We introduce a new benchmark called Open-World DeepFake (OW-DFA), which aims to evaluate attribution performance against various types of fake faces under open-world scenarios.
We propose a novel framework named Contrastive Pseudo Learning (CPL) for the OW-DFA task through 1) introducing a Global-Local Voting module to guide the feature alignment of forged faces with different manipulated regions, 2) designing a Confidence-based Soft Pseudo-label strategy to mitigate the pseudo-noise caused by similar methods in unlabeled set.
arXiv Detail & Related papers (2023-09-20T08:29:22Z) - Detecting Adversarial Directions in Deep Reinforcement Learning to Make
Robust Decisions [8.173034693197351]
We propose a novel method to detect the presence of non-robust directions in MDPs.
Our method provides a theoretical basis for the fundamental cut-off between safe observations and adversarial observations.
Most significantly, we demonstrate the effectiveness of our approach even in the setting where non-robust directions are explicitly optimized to circumvent our proposed method.
arXiv Detail & Related papers (2023-06-09T13:11:05Z) - Approximate Shielding of Atari Agents for Safe Exploration [83.55437924143615]
We propose a principled algorithm for safe exploration based on the concept of shielding.
We present preliminary results that show our approximate shielding algorithm effectively reduces the rate of safety violations.
arXiv Detail & Related papers (2023-04-21T16:19:54Z) - Safe Exploration Method for Reinforcement Learning under Existence of
Disturbance [1.1470070927586016]
We deal with a safe exploration problem in reinforcement learning under the existence of disturbance.
We propose a safe exploration method that uses partial prior knowledge of a controlled object and disturbance.
We illustrate the validity and effectiveness of the proposed method through numerical simulations of an inverted pendulum and a four-bar parallel link robot manipulator.
arXiv Detail & Related papers (2022-09-30T13:00:33Z) - ADER:Adapting between Exploration and Robustness for Actor-Critic
Methods [8.750251598581102]
We show that TD3's performance lags behind the vanilla actor-critic methods in some primitive environments.
We propose a novel algorithm toward this problem that ADapts between Exploration and Robustness, namely ADER.
Experiments in several challenging environments demonstrate the supremacy of the proposed method in continuous control tasks.
arXiv Detail & Related papers (2021-09-08T05:48:39Z) - Temporal Difference Uncertainties as a Signal for Exploration [76.6341354269013]
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy.
In this paper, we highlight that value estimates are easily biased and temporally inconsistent.
We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
arXiv Detail & Related papers (2020-10-05T18:11:22Z) - Reparameterized Variational Divergence Minimization for Stable Imitation [57.06909373038396]
We study the extent to which variations in the choice of probabilistic divergence may yield more performant ILO algorithms.
We contribute a re parameterization trick for adversarial imitation learning to alleviate the challenges of the promising $f$-divergence minimization framework.
Empirically, we demonstrate that our design choices allow for ILO algorithms that outperform baseline approaches and more closely match expert performance in low-dimensional continuous-control tasks.
arXiv Detail & Related papers (2020-06-18T19:04:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.