Smoother Entropy for Active State Trajectory Estimation and Obfuscation
in POMDPs
- URL: http://arxiv.org/abs/2108.10227v1
- Date: Thu, 19 Aug 2021 00:05:55 GMT
- Title: Smoother Entropy for Active State Trajectory Estimation and Obfuscation
in POMDPs
- Authors: Timothy L. Molloy and Girish N. Nair
- Abstract summary: optimisation of the smoother entropy leads to superior trajectory estimation and obfuscation compared to alternative approaches.
We identify belief-state MDP reformulations of both active estimation and obfuscation with concave cost and cost-to-go functions.
- Score: 3.42658286826597
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of controlling a partially observed Markov decision
process (POMDP) to either aid or hinder the estimation of its state trajectory
by optimising the conditional entropy of the state trajectory given
measurements and controls, a quantity we dub the smoother entropy. Our
consideration of the smoother entropy contrasts with previous active state
estimation and obfuscation approaches that instead resort to measures of
marginal (or instantaneous) state uncertainty due to tractability concerns. By
establishing novel expressions of the smoother entropy in terms of the usual
POMDP belief state, we show that our active estimation and obfuscation problems
can be reformulated as Markov decision processes (MDPs) that are fully observed
in the belief state. Surprisingly, we identify belief-state MDP reformulations
of both active estimation and obfuscation with concave cost and cost-to-go
functions, which enables the use of standard POMDP techniques to construct
tractable bounded-error (approximate) solutions. We show in simulations that
optimisation of the smoother entropy leads to superior trajectory estimation
and obfuscation compared to alternative approaches.
Related papers
- Observation Adaptation via Annealed Importance Resampling for Partially Observable Markov Decision Processes [4.830416359005018]
Partially observable Markov decision processes (POMDPs) are a general mathematical model for sequential decision-making in environments under state uncertainty.
Online solvers typically use bootstrap particle filters based on importance resampling for updating the belief distribution.
We propose an approach that constructs a sequence of bridge distributions between the state-transition and optimal distributions through iterative Monte Carlo steps.
arXiv Detail & Related papers (2025-03-25T03:05:00Z) - Asymptotically Optimal Change Detection for Unnormalized Pre- and Post-Change Distributions [65.38208224389027]
This paper addresses the problem of detecting changes when only unnormalized pre- and post-change distributions are accessible.
Our approach is based on the estimation of the Cumulative Sum statistics, which is known to produce optimal performance.
arXiv Detail & Related papers (2024-10-18T17:13:29Z) - Entropic Matching for Expectation Propagation of Markov Jump Processes [38.60042579423602]
We propose a new tractable inference scheme based on an entropic matching framework.
We demonstrate the effectiveness of our method by providing closed-form results for a simple family of approximate distributions.
We derive expressions for point estimation of the underlying parameters using an approximate expectation procedure.
arXiv Detail & Related papers (2023-09-27T12:07:21Z) - Computationally Efficient PAC RL in POMDPs with Latent Determinism and
Conditional Embeddings [97.12538243736705]
We study reinforcement learning with function approximation for large-scale Partially Observable Decision Processes (POMDPs)
Our algorithm provably scales to large-scale POMDPs.
arXiv Detail & Related papers (2022-06-24T05:13:35Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Finite-Time Analysis of Natural Actor-Critic for POMDPs [29.978816372127085]
We consider the reinforcement learning problem for partially observed Markov decision processes (POMDPs) with large or even countably infinite state spaces.
We consider a natural actor-critic method that employs a finite internal memory for policy parameterization.
We show that this error can be made small in the case of sliding-block controllers by using larger block sizes.
arXiv Detail & Related papers (2022-02-20T07:42:00Z) - Entropy-Regularized Partially Observed Markov Decision Processes [3.42658286826597]
We investigate partially observed Markov decision processes (POMDPs) with cost functions regularized by entropy terms describing state, observation, and control uncertainty.
Standard POMDP techniques are shown to offer bounded-error solutions to entropy-regularized POMDPs.
Our joint-entropy result is particularly surprising since it constitutes a novel, tractable formulation of active state estimation.
arXiv Detail & Related papers (2021-12-22T22:44:44Z) - Q-Learning for MDPs with General Spaces: Convergence and Near Optimality
via Quantization under Weak Continuity [2.685668802278156]
We show that Q-learning for standard Borel MDPs via quantization of states and actions converges to a limit.
Our paper presents a very general convergence and approximation result for the applicability of Q-learning for continuous MDPs.
arXiv Detail & Related papers (2021-11-12T15:47:10Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Near Optimality of Finite Memory Feedback Policies in Partially Observed
Markov Decision Processes [0.0]
We study a planning problem for POMDPs where the system dynamics and measurement channel model is assumed to be known.
We find optimal policies for the approximate belief model under mild non-linear filter stability conditions.
We also establish a rate of convergence result which relates the finite window memory size and the approximation error bound.
arXiv Detail & Related papers (2020-10-15T00:37:51Z) - Batch Stationary Distribution Estimation [98.18201132095066]
We consider the problem of approximating the stationary distribution of an ergodic Markov chain given a set of sampled transitions.
We propose a consistent estimator that is based on recovering a correction ratio function over the given data.
arXiv Detail & Related papers (2020-03-02T09:10:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.