An On-Line POMDP Solver for Continuous Observation Spaces
- URL: http://arxiv.org/abs/2011.02076v1
- Date: Wed, 4 Nov 2020 00:16:08 GMT
- Title: An On-Line POMDP Solver for Continuous Observation Spaces
- Authors: Marcus Hoerger, Hanna Kurniawati
- Abstract summary: We propose a new on-line POMDP solver, called Lazy Belief Extraction for Continuous POMDPs (LABECOP)
It combines methods from Monte-Carlo-Tree-Search and particle filtering to construct a policy reprentation which doesn't require discretised observation spaces.
Experiments on three different problems involving continuous observation spaces indicate that LABECOP performs similar or better than state-of-the-art POMDP solvers.
- Score: 5.482532589225552
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Planning under partial obervability is essential for autonomous robots. A
principled way to address such planning problems is the Partially Observable
Markov Decision Process (POMDP). Although solving POMDPs is computationally
intractable, substantial advancements have been achieved in developing
approximate POMDP solvers in the past two decades. However, computing robust
solutions for problems with continuous observation spaces remains challenging.
Most on-line solvers rely on discretising the observation space or artificially
limiting the number of observations that are considered during planning to
compute tractable policies. In this paper we propose a new on-line POMDP
solver, called Lazy Belief Extraction for Continuous POMDPs (LABECOP), that
combines methods from Monte-Carlo-Tree-Search and particle filtering to
construct a policy reprentation which doesn't require discretised observation
spaces and avoids limiting the number of observations considered during
planning. Experiments on three different problems involving continuous
observation spaces indicate that LABECOP performs similar or better than
state-of-the-art POMDP solvers.
Related papers
- Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - RL in Latent MDPs is Tractable: Online Guarantees via Off-Policy Evaluation [73.2390735383842]
We introduce the first sample-efficient algorithm for LMDPs without any additional structural assumptions.
We show how these can be used to derive near-optimal guarantees of an optimistic exploration algorithm.
These results can be valuable for a wide range of interactive learning problems beyond LMDPs, and especially, for partially observed environments.
arXiv Detail & Related papers (2024-06-03T14:51:27Z) - Learning Logic Specifications for Policy Guidance in POMDPs: an
Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver.
We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications.
We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z) - A Surprisingly Simple Continuous-Action POMDP Solver: Lazy Cross-Entropy
Search Over Policy Trees [5.250288418639076]
We propose an online POMDP solver called Lazy Cross-Entropy Search Over Policy Trees (LCEOPT)
At each planning step, our method uses a novel lazy Cross-Entropy method to search the space of policy trees.
Our method is surprisingly simple as compared to existing state-of-the-art methods, yet empirically outperforms them on several continuous-action POMDP problems.
arXiv Detail & Related papers (2023-05-14T03:12:53Z) - End-to-End Policy Gradient Method for POMDPs and Explainable Agents [2.1700203922407493]
We propose an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph.
Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.
arXiv Detail & Related papers (2023-04-19T15:45:52Z) - Online Planning for Constrained POMDPs with Continuous Spaces through
Dual Ascent [37.61747231296097]
We propose algorithms for online CPOMDP planning for continuous state, action, and observation spaces.
We empirically compare the effectiveness of our proposed algorithms on continuous CPOMDPs that model both toy and real-world safety-critical problems.
arXiv Detail & Related papers (2022-12-23T05:22:39Z) - Optimality Guarantees for Particle Belief Approximation of POMDPs [55.83001584645448]
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems.
POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid.
We propose a theory characterizing the approximation error of the particle filtering techniques that these algorithms use.
arXiv Detail & Related papers (2022-10-10T21:11:55Z) - Linear programming-based solution methods for constrained POMDPs [0.5156484100374059]
Constrained partially observable Markov decision processes (CPOMDPs) have been used to model various real-world phenomena.
We use grid-based approximations in combination with linear programming (LP) models to generate approximate policies for CPOMDPs.
arXiv Detail & Related papers (2022-06-28T15:22:24Z) - Computationally Efficient PAC RL in POMDPs with Latent Determinism and
Conditional Embeddings [97.12538243736705]
We study reinforcement learning with function approximation for large-scale Partially Observable Decision Processes (POMDPs)
Our algorithm provably scales to large-scale POMDPs.
arXiv Detail & Related papers (2022-06-24T05:13:35Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.