Structural Estimation of Partially Observable Markov Decision Processes
- URL: http://arxiv.org/abs/2008.00500v3
- Date: Tue, 28 Dec 2021 18:58:40 GMT
- Title: Structural Estimation of Partially Observable Markov Decision Processes
- Authors: Yanling Chang and Alfredo Garcia and Zhide Wang and Lu Sun
- Abstract summary: We consider the structural estimation of the primitives of a POMDP model based upon the observable history of the process.
We illustrate the estimation methodology with an application to optimal equipment replacement.
- Score: 3.1614382994158956
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In many practical settings control decisions must be made under
partial/imperfect information about the evolution of a relevant state variable.
Partially Observable Markov Decision Processes (POMDPs) is a relatively
well-developed framework for modeling and analyzing such problems. In this
paper we consider the structural estimation of the primitives of a POMDP model
based upon the observable history of the process. We analyze the structural
properties of POMDP model with random rewards and specify conditions under
which the model is identifiable without knowledge of the state dynamics. We
consider a soft policy gradient algorithm to compute a maximum likelihood
estimator and provide a finite-time characterization of convergence to a
stationary point. We illustrate the estimation methodology with an application
to optimal equipment replacement. In this context, replacement decisions must
be made under partial/imperfect information on the true state (i.e. condition
of the equipment). We use synthetic and real data to highlight the robustness
of the proposed methodology and characterize the potential for misspecification
when partial state observability is ignored.
Related papers
- Learning non-Markovian Decision-Making from State-only Sequences [57.20193609153983]
We develop a model-based imitation of state-only sequences with non-Markov Decision Process (nMDP)
We demonstrate the efficacy of the proposed method in a path planning task with non-Markovian constraints.
arXiv Detail & Related papers (2023-06-27T02:26:01Z) - Bridging POMDPs and Bayesian decision making for robust maintenance
planning under model uncertainty: An application to railway systems [0.7046417074932257]
We present a framework to estimate POMDP transition and observation model parameters directly from available data.
We then form and solve the POMDP problem by exploiting the inferred distributions.
We successfully apply our approach on maintenance planning for railway track assets.
arXiv Detail & Related papers (2022-12-15T16:09:47Z) - Distributed Bayesian Learning of Dynamic States [65.7870637855531]
The proposed algorithm is a distributed Bayesian filtering task for finite-state hidden Markov models.
It can be used for sequential state estimation, as well as for modeling opinion formation over social networks under dynamic environments.
arXiv Detail & Related papers (2022-12-05T19:40:17Z) - Numerically Stable Sparse Gaussian Processes via Minimum Separation
using Cover Trees [57.67528738886731]
We study the numerical stability of scalable sparse approximations based on inducing points.
For low-dimensional tasks such as geospatial modeling, we propose an automated method for computing inducing points satisfying these conditions.
arXiv Detail & Related papers (2022-10-14T15:20:17Z) - Proximal Reinforcement Learning: Efficient Off-Policy Evaluation in
Partially Observed Markov Decision Processes [65.91730154730905]
In applications of offline reinforcement learning to observational data, such as in healthcare or education, a general concern is that observed actions might be affected by unobserved factors.
Here we tackle this by considering off-policy evaluation in a partially observed Markov decision process (POMDP)
We extend the framework of proximal causal inference to our POMDP setting, providing a variety of settings where identification is made possible.
arXiv Detail & Related papers (2021-10-28T17:46:14Z) - Sequential Stochastic Optimization in Separable Learning Environments [0.0]
We consider a class of sequential decision-making problems under uncertainty that can encompass various types of supervised learning concepts.
These problems have a completely observed state process and a partially observed modulation process, where the state process is affected by the modulation process only through an observation process.
We model this broad class of problems as a partially observed Markov decision process (POMDP)
arXiv Detail & Related papers (2021-08-21T21:29:04Z) - Lifted Model Checking for Relational MDPs [12.574454799055026]
pCTL-REBEL is a lifted model checking approach for verifying pCTL properties on relational MDPs.
We show that the pCTL model checking approach is decidable for relational MDPs even for possibly infinite domains.
arXiv Detail & Related papers (2021-06-22T13:12:36Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Point-Based Methods for Model Checking in Partially Observable Markov
Decision Processes [36.07746952116073]
We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP)
We show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula.
We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.
arXiv Detail & Related papers (2020-01-11T23:09:25Z) - Value of structural health information in partially observable
stochastic environments [0.0]
We introduce and study the theoretical and computational foundations of the Value of Information (VoI) and the Value of Structural Health Monitoring (VoSHM)
It is shown that a POMDP policy inherently leverages the notion of VoI to guide observational actions in an optimal way at every decision step.
arXiv Detail & Related papers (2019-12-28T22:18:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.