Covert Planning against Imperfect Observers
- URL: http://arxiv.org/abs/2310.16791v2
- Date: Wed, 1 Nov 2023 17:44:46 GMT
- Title: Covert Planning against Imperfect Observers
- Authors: Haoxiang Ma, Chongyang Shi, Shuo Han, Michael R. Dorothy, and Jie Fu
- Abstract summary: Covert planning refers to a class of constrained planning problems where an agent aims to accomplish a task with minimal information leaked to a passive observer to avoid detection.
This paper studies how covert planning can leverage the coupling of dynamics and the observer's imperfect observation to achieve optimal performance without being detected.
- Score: 29.610121527096286
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Covert planning refers to a class of constrained planning problems where an
agent aims to accomplish a task with minimal information leaked to a passive
observer to avoid detection. However, existing methods of covert planning often
consider deterministic environments or do not exploit the observer's imperfect
information. This paper studies how covert planning can leverage the coupling
of stochastic dynamics and the observer's imperfect observation to achieve
optimal task performance without being detected. Specifically, we employ a
Markov decision process to model the interaction between the agent and its
stochastic environment, and a partial observation function to capture the
leaked information to a passive observer. Assuming the observer employs
hypothesis testing to detect if the observation deviates from a nominal policy,
the covert planning agent aims to maximize the total discounted reward while
keeping the probability of being detected as an adversary below a given
threshold. We prove that finite-memory policies are more powerful than
Markovian policies in covert planning. Then, we develop a primal-dual proximal
policy gradient method with a two-time-scale update to compute a (locally)
optimal covert policy. We demonstrate the effectiveness of our methods using a
stochastic gridworld example. Our experimental results illustrate that the
proposed method computes a policy that maximizes the adversary's expected
reward without violating the detection constraint, and empirically demonstrates
how the environmental noises can influence the performance of the covert
policies.
Related papers
- How to Exhibit More Predictable Behaviors [3.5248694676821484]
This paper looks at predictability problems wherein an agent must choose its strategy in order to optimize predictions that an external observer could make.
We take into account uncertainties on the environment dynamics and on the observed agent's policy.
We propose action and state predictability performance criteria through reward functions built on the observer's belief about the agent policy.
arXiv Detail & Related papers (2024-04-17T12:06:17Z) - Distributional Method for Risk Averse Reinforcement Learning [0.0]
We introduce a distributional method for learning the optimal policy in risk averse Markov decision process.
We assume sequential observations of states, actions, and costs and assess the performance of a policy using dynamic risk measures.
arXiv Detail & Related papers (2023-02-27T19:48:42Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Deceptive Decision-Making Under Uncertainty [25.197098169762356]
We study the design of autonomous agents that are capable of deceiving outside observers about their intentions while carrying out tasks.
By modeling the agent's behavior as a Markov decision process, we consider a setting where the agent aims to reach one of multiple potential goals.
We propose a novel approach to model observer predictions based on the principle of maximum entropy and to efficiently generate deceptive strategies.
arXiv Detail & Related papers (2021-09-14T14:56:23Z) - Learning Uncertainty For Safety-Oriented Semantic Segmentation In
Autonomous Driving [77.39239190539871]
We show how uncertainty estimation can be leveraged to enable safety critical image segmentation in autonomous driving.
We introduce a new uncertainty measure based on disagreeing predictions as measured by a dissimilarity function.
We show experimentally that our proposed approach is much less computationally intensive at inference time than competing methods.
arXiv Detail & Related papers (2021-05-28T09:23:05Z) - Minimax Off-Policy Evaluation for Multi-Armed Bandits [58.7013651350436]
We study the problem of off-policy evaluation in the multi-armed bandit model with bounded rewards.
We develop minimax rate-optimal procedures under three settings.
arXiv Detail & Related papers (2021-01-19T18:55:29Z) - Reliable Off-policy Evaluation for Reinforcement Learning [53.486680020852724]
In a sequential decision-making problem, off-policy evaluation estimates the expected cumulative reward of a target policy.
We propose a novel framework that provides robust and optimistic cumulative reward estimates using one or multiple logged data.
arXiv Detail & Related papers (2020-11-08T23:16:19Z) - Temporal Difference Uncertainties as a Signal for Exploration [76.6341354269013]
An effective approach to exploration in reinforcement learning is to rely on an agent's uncertainty over the optimal policy.
In this paper, we highlight that value estimates are easily biased and temporally inconsistent.
We propose a novel method for estimating uncertainty over the value function that relies on inducing a distribution over temporal difference errors.
arXiv Detail & Related papers (2020-10-05T18:11:22Z) - Minimax-Optimal Off-Policy Evaluation with Linear Function Approximation [49.502277468627035]
This paper studies the statistical theory of batch data reinforcement learning with function approximation.
Consider the off-policy evaluation problem, which is to estimate the cumulative value of a new target policy from logged history.
arXiv Detail & Related papers (2020-02-21T19:20:57Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.