Sequential Stochastic Optimization in Separable Learning Environments
- URL: http://arxiv.org/abs/2108.09585v1
- Date: Sat, 21 Aug 2021 21:29:04 GMT
- Title: Sequential Stochastic Optimization in Separable Learning Environments
- Authors: R. Reid Bishop and Chelsea C. White III
- Abstract summary: We consider a class of sequential decision-making problems under uncertainty that can encompass various types of supervised learning concepts.
These problems have a completely observed state process and a partially observed modulation process, where the state process is affected by the modulation process only through an observation process.
We model this broad class of problems as a partially observed Markov decision process (POMDP)
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider a class of sequential decision-making problems under uncertainty
that can encompass various types of supervised learning concepts. These
problems have a completely observed state process and a partially observed
modulation process, where the state process is affected by the modulation
process only through an observation process, the observation process only
observes the modulation process, and the modulation process is exogenous to
control. We model this broad class of problems as a partially observed Markov
decision process (POMDP). The belief function for the modulation process is
control invariant, thus separating the estimation of the modulation process
from the control of the state process. We call this specially structured POMDP
the separable POMDP, or SEP-POMDP, and show it (i) can serve as a model for a
broad class of application areas, e.g., inventory control, finance, healthcare
systems, (ii) inherits value function and optimal policy structure from a set
of completely observed MDPs, (iii) can serve as a bridge between classical
models of sequential decision making under uncertainty having fully specified
model artifacts and such models that are not fully specified and require the
use of predictive methods from statistics and machine learning, and (iv) allows
for specialized approximate solution procedures.
Related papers
- Learning non-Markovian Decision-Making from State-only Sequences [57.20193609153983]
We develop a model-based imitation of state-only sequences with non-Markov Decision Process (nMDP)
We demonstrate the efficacy of the proposed method in a path planning task with non-Markovian constraints.
arXiv Detail & Related papers (2023-06-27T02:26:01Z) - Bridging POMDPs and Bayesian decision making for robust maintenance
planning under model uncertainty: An application to railway systems [0.7046417074932257]
We present a framework to estimate POMDP transition and observation model parameters directly from available data.
We then form and solve the POMDP problem by exploiting the inferred distributions.
We successfully apply our approach on maintenance planning for railway track assets.
arXiv Detail & Related papers (2022-12-15T16:09:47Z) - Optimistic MLE -- A Generic Model-based Algorithm for Partially
Observable Sequential Decision Making [48.87943416098096]
This paper introduces a simple efficient learning algorithms for general sequential decision making.
We prove that OMLE learns near-optimal policies of an enormously rich class of sequential decision making problems.
arXiv Detail & Related papers (2022-09-29T17:56:25Z) - CoCoMoT: Conformance Checking of Multi-Perspective Processes via SMT
(Extended Version) [62.96267257163426]
We introduce the CoCoMoT (Computing Conformance Modulo Theories) framework.
First, we show how SAT-based encodings studied in the pure control-flow setting can be lifted to our data-aware case.
Second, we introduce a novel preprocessing technique based on a notion of property-preserving clustering.
arXiv Detail & Related papers (2021-03-18T20:22:50Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Monitoring multimode processes: a modified PCA algorithm with continual
learning ability [2.5004754622137515]
It could be an effective manner to make local monitoring model remember the features of previous modes.
A modified PCA algorithm is built with continual learning ability for monitoring multimode processes.
It is called PCA-EWC, where the significant features of previous modes are preserved when a PCA model is established for the current mode.
arXiv Detail & Related papers (2020-12-13T12:09:38Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Structural Estimation of Partially Observable Markov Decision Processes [3.1614382994158956]
We consider the structural estimation of the primitives of a POMDP model based upon the observable history of the process.
We illustrate the estimation methodology with an application to optimal equipment replacement.
arXiv Detail & Related papers (2020-08-02T15:04:27Z) - Adversarial System Variant Approximation to Quantify Process Model
Generalization [2.538209532048867]
In process mining, process models are extracted from event logs and are commonly assessed using multiple quality dimensions.
A novel deep learning-based methodology called Adversarial System Variant Approximation (AVATAR) is proposed to overcome this issue.
arXiv Detail & Related papers (2020-03-26T22:06:18Z) - Invariant Causal Prediction for Block MDPs [106.63346115341862]
Generalization across environments is critical to the successful application of reinforcement learning algorithms to real-world challenges.
We propose a method of invariant prediction to learn model-irrelevance state abstractions (MISA) that generalize to novel observations in the multi-environment setting.
arXiv Detail & Related papers (2020-03-12T21:03:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.