Simplifying Complex Observation Models in Continuous POMDP Planning with
Probabilistic Guarantees and Practice
- URL: http://arxiv.org/abs/2311.07745v4
- Date: Sat, 27 Jan 2024 12:43:46 GMT
- Title: Simplifying Complex Observation Models in Continuous POMDP Planning with
Probabilistic Guarantees and Practice
- Authors: Idan Lev-Yehudi, Moran Barenboim, Vadim Indelman
- Abstract summary: We deal with the question of what would be the implication of using simplified observation models for planning.
Our main contribution is a novel probabilistic bound based on a statistical total variation distance of the simplified model.
Our calculations can be separated into offline and online parts, and we arrive at formal guarantees without having to access the costly model at all during planning.
- Score: 9.444784653236157
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Solving partially observable Markov decision processes (POMDPs) with high
dimensional and continuous observations, such as camera images, is required for
many real life robotics and planning problems. Recent researches suggested
machine learned probabilistic models as observation models, but their use is
currently too computationally expensive for online deployment. We deal with the
question of what would be the implication of using simplified observation
models for planning, while retaining formal guarantees on the quality of the
solution. Our main contribution is a novel probabilistic bound based on a
statistical total variation distance of the simplified model. We show that it
bounds the theoretical POMDP value w.r.t. original model, from the empirical
planned value with the simplified model, by generalizing recent results of
particle-belief MDP concentration bounds. Our calculations can be separated
into offline and online parts, and we arrive at formal guarantees without
having to access the costly model at all during planning, which is also a novel
result. Finally, we demonstrate in simulation how to integrate the bound into
the routine of an existing continuous online POMDP solver.
Related papers
- On Discriminative Probabilistic Modeling for Self-Supervised Representation Learning [85.75164588939185]
We study the discriminative probabilistic modeling problem on a continuous domain for (multimodal) self-supervised representation learning.
We conduct generalization error analysis to reveal the limitation of current InfoNCE-based contrastive loss for self-supervised representation learning.
arXiv Detail & Related papers (2024-10-11T18:02:46Z) - Towards Stable Machine Learning Model Retraining via Slowly Varying Sequences [6.067007470552307]
We propose a methodology for finding sequences of machine learning models that are stable across retraining iterations.
We develop a mixed-integer optimization formulation that is guaranteed to recover optimal models.
Our method shows stronger stability than greedily trained models with a small, controllable sacrifice in predictive power.
arXiv Detail & Related papers (2024-03-28T22:45:38Z) - Online Variational Sequential Monte Carlo [49.97673761305336]
We build upon the variational sequential Monte Carlo (VSMC) method, which provides computationally efficient and accurate model parameter estimation and Bayesian latent-state inference.
Online VSMC is capable of performing efficiently, entirely on-the-fly, both parameter estimation and particle proposal adaptation.
arXiv Detail & Related papers (2023-12-19T21:45:38Z) - Exact and general decoupled solutions of the LMC Multitask Gaussian Process model [28.32223907511862]
The Linear Model of Co-regionalization (LMC) is a very general model of multitask gaussian process for regression or classification.
Recent work has shown that under some conditions the latent processes of the model can be decoupled, leading to a complexity that is only linear in the number of said processes.
We here extend these results, showing from the most general assumptions that the only condition necessary to an efficient exact computation of the LMC is a mild hypothesis on the noise model.
arXiv Detail & Related papers (2023-10-18T15:16:24Z) - Predictable MDP Abstraction for Unsupervised Model-Based RL [93.91375268580806]
We propose predictable MDP abstraction (PMA)
Instead of training a predictive model on the original MDP, we train a model on a transformed MDP with a learned action space.
We theoretically analyze PMA and empirically demonstrate that PMA leads to significant improvements over prior unsupervised model-based RL approaches.
arXiv Detail & Related papers (2023-02-08T07:37:51Z) - Continuous Mixtures of Tractable Probabilistic Models [10.667104977730304]
Probabilistic models based on continuous latent spaces, such as variational autoencoders, can be understood as uncountable mixture models.
Probabilistic circuits (PCs) can be understood as hierarchical discrete mixture models.
In this paper, we investigate a hybrid approach, namely continuous mixtures of tractable models with a small latent dimension.
arXiv Detail & Related papers (2022-09-21T18:18:32Z) - PAC Reinforcement Learning for Predictive State Representations [60.00237613646686]
We study online Reinforcement Learning (RL) in partially observable dynamical systems.
We focus on the Predictive State Representations (PSRs) model, which is an expressive model that captures other well-known models.
We develop a novel model-based algorithm for PSRs that can learn a near optimal policy in sample complexity scalingly.
arXiv Detail & Related papers (2022-07-12T17:57:17Z) - Closed-form Continuous-Depth Models [99.40335716948101]
Continuous-depth neural models rely on advanced numerical differential equation solvers.
We present a new family of models, termed Closed-form Continuous-depth (CfC) networks, that are simple to describe and at least one order of magnitude faster.
arXiv Detail & Related papers (2021-06-25T22:08:51Z) - Model-based Policy Optimization with Unsupervised Model Adaptation [37.09948645461043]
We investigate how to bridge the gap between real and simulated data due to inaccurate model estimation for better policy optimization.
We propose a novel model-based reinforcement learning framework AMPO, which introduces unsupervised model adaptation.
Our approach achieves state-of-the-art performance in terms of sample efficiency on a range of continuous control benchmark tasks.
arXiv Detail & Related papers (2020-10-19T14:19:42Z) - Control as Hybrid Inference [62.997667081978825]
We present an implementation of CHI which naturally mediates the balance between iterative and amortised inference.
We verify the scalability of our algorithm on a continuous control benchmark, demonstrating that it outperforms strong model-free and model-based baselines.
arXiv Detail & Related papers (2020-07-11T19:44:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.