Related papers: An On-Line POMDP Solver for Continuous Observation Spaces

An On-Line POMDP Solver for Continuous Observation Spaces

URL: http://arxiv.org/abs/2011.02076v1
Date: Wed, 4 Nov 2020 00:16:08 GMT
Title: An On-Line POMDP Solver for Continuous Observation Spaces
Authors: Marcus Hoerger, Hanna Kurniawati
Abstract summary: We propose a new on-line POMDP solver, called Lazy Belief Extraction for Continuous POMDPs (LABECOP) It combines methods from Monte-Carlo-Tree-Search and particle filtering to construct a policy reprentation which doesn't require discretised observation spaces. Experiments on three different problems involving continuous observation spaces indicate that LABECOP performs similar or better than state-of-the-art POMDP solvers.
Score: 5.482532589225552
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Planning under partial obervability is essential for autonomous robots. A principled way to address such planning problems is the Partially Observable Markov Decision Process (POMDP). Although solving POMDPs is computationally intractable, substantial advancements have been achieved in developing approximate POMDP solvers in the past two decades. However, computing robust solutions for problems with continuous observation spaces remains challenging. Most on-line solvers rely on discretising the observation space or artificially limiting the number of observations that are considered during planning to compute tractable policies. In this paper we propose a new on-line POMDP solver, called Lazy Belief Extraction for Continuous POMDPs (LABECOP), that combines methods from Monte-Carlo-Tree-Search and particle filtering to construct a policy reprentation which doesn't require discretised observation spaces and avoids limiting the number of observations considered during planning. Experiments on three different problems involving continuous observation spaces indicate that LABECOP performs similar or better than state-of-the-art POMDP solvers.

Related papers

Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy. We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z)
RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation [73.2390735383842]
We introduce the first sample-efficient algorithm for LMDPs without any additional structural assumptions. We show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm. These results can be valuable for a wide range of interactive learning problems beyond LMDPs, and especially, for partially observed environments.
arXiv Detail & Related papers (2024-06-03T14:51:27Z)
Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver. We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications. We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z)
A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy Search Over Policy Trees [5.250288418639076]
We propose an online POMDP solver called Lazy Cross-Entropy Search Over Policy Trees (LCEOPT) At each planning step, our method uses a novel lazy Cross-Entropy method to search the space of policy trees. Our method is surprisingly simple as compared to existing state-of-the-art methods, yet empirically outperforms them on several continuous-action POMDP problems.
arXiv Detail & Related papers (2023-05-14T03:12:53Z)
End-to-End Policy Gradient Method for POMDPs and Explainable Agents [2.1700203922407493]
We propose an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph. Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.
arXiv Detail & Related papers (2023-04-19T15:45:52Z)
Online Planning for Constrained POMDPs with Continuous Spaces through Dual Ascent [37.61747231296097]
We propose algorithms for online CPOMDP planning for continuous state, action, and observation spaces. We empirically compare the effectiveness of our proposed algorithms on continuous CPOMDPs that model both toy and real-world safety-critical problems.
arXiv Detail & Related papers (2022-12-23T05:22:39Z)
Optimality Guarantees for Particle Belief Approximation of POMDPs [55.83001584645448]
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid. We propose a theory characterizing the approximation error of the particle filtering techniques that these algorithms use.
arXiv Detail & Related papers (2022-10-10T21:11:55Z)
Linear programming-based solution methods for constrained POMDPs [0.5156484100374059]
Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various real-world phenomena. We use grid-based approximations in combination with linear programming (LP) models to generate approximate policies for CPOMDPs.
arXiv Detail & Related papers (2022-06-28T15:22:24Z)
Computationally Efficient PAC RL in POMDPs with Latent Determinism and Conditional Embeddings [97.12538243736705]
We study reinforcement learning with function approximation for large-scale Partially Observable Decision Processes (POMDPs) Our algorithm provably scales to large-scale POMDPs.
arXiv Detail & Related papers (2022-06-24T05:13:35Z)
Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces. The proposed method explores local properties of policy behavior to identify unexpected decisions. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.