Efficient Sampling in POMDPs with Lipschitz Bandits for Motion Planning
in Continuous Spaces
- URL: http://arxiv.org/abs/2106.04206v1
- Date: Tue, 8 Jun 2021 09:31:48 GMT
- Title: Efficient Sampling in POMDPs with Lipschitz Bandits for Motion Planning
in Continuous Spaces
- Authors: \"Omer \c{S}ahin Ta\c{s}, Felix Hauser, Martin Lauer
- Abstract summary: Decision making under uncertainty can be framed as a partially observable Markov decision process (POMDP)
Finding exact solutions of POMDPs is generally intractable, but the solution can be approximated by sampling-based approaches.
We demonstrate the effectiveness of this approach in the context of motion planning for automated driving.
- Score: 5.732271870257913
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Decision making under uncertainty can be framed as a partially observable
Markov decision process (POMDP). Finding exact solutions of POMDPs is generally
computationally intractable, but the solution can be approximated by
sampling-based approaches. These sampling-based POMDP solvers rely on
multi-armed bandit (MAB) heuristics, which assume the outcomes of different
actions to be uncorrelated. In some applications, like motion planning in
continuous spaces, similar actions yield similar outcomes. In this paper, we
utilize variants of MAB heuristics that make Lipschitz continuity assumptions
on the outcomes of actions to improve the efficiency of sampling-based planning
approaches. We demonstrate the effectiveness of this approach in the context of
motion planning for automated driving.
Related papers
- Amortizing intractable inference in large language models [56.92471123778389]
We use amortized Bayesian inference to sample from intractable posterior distributions.
We empirically demonstrate that this distribution-matching paradigm of LLM fine-tuning can serve as an effective alternative to maximum-likelihood training.
As an important application, we interpret chain-of-thought reasoning as a latent variable modeling problem.
arXiv Detail & Related papers (2023-10-06T16:36:08Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - End-to-End Policy Gradient Method for POMDPs and Explainable Agents [2.1700203922407493]
We propose an RL algorithm that estimates the hidden states by end-to-end training, and visualize the estimation as a state-transition graph.
Experimental results demonstrated that the proposed algorithm can solve simple POMDP problems and that the visualization makes the agent's behavior interpretable to humans.
arXiv Detail & Related papers (2023-04-19T15:45:52Z) - Evaluating Guiding Spaces for Motion Planning [2.384084215091134]
We define the emphmotion planning guiding space, which encapsulates many seemingly distinct prior works under the same framework.
We also suggest an information theoretic method to evaluate guided planning which places the focus on the quality of the resulting biased sampling.
arXiv Detail & Related papers (2022-10-16T21:17:51Z) - Optimality Guarantees for Particle Belief Approximation of POMDPs [55.83001584645448]
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems.
POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid.
We propose a theory characterizing the approximation error of the particle filtering techniques that these algorithms use.
arXiv Detail & Related papers (2022-10-10T21:11:55Z) - Expert-Guided Symmetry Detection in Markov Decision Processes [0.0]
We propose a paradigm that aims to detect the presence of some transformations of the state-action space for which the MDP dynamics is invariant.
The results show that the model distributional shift is reduced when the dataset is augmented with the data obtained by using the detected symmetries.
arXiv Detail & Related papers (2021-11-19T16:12:30Z) - On Solving a Stochastic Shortest-Path Markov Decision Process as
Probabilistic Inference [5.517104116168873]
We propose solving the general Decision Shortest-Path Markov Process (SSP MDP) as probabilistic inference.
We discuss online and offline methods for planning under uncertainty.
arXiv Detail & Related papers (2021-09-13T11:07:52Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - A maximum-entropy approach to off-policy evaluation in average-reward
MDPs [54.967872716145656]
This work focuses on off-policy evaluation (OPE) with function approximation in infinite-horizon undiscounted Markov decision processes (MDPs)
We provide the first finite-sample OPE error bound, extending existing results beyond the episodic and discounted cases.
We show that this results in an exponential-family distribution whose sufficient statistics are the features, paralleling maximum-entropy approaches in supervised learning.
arXiv Detail & Related papers (2020-06-17T18:13:37Z) - Planning in Markov Decision Processes with Gap-Dependent Sample
Complexity [48.98199700043158]
We propose MDP-GapE, a new trajectory-based Monte-Carlo Tree Search algorithm for planning in a Markov Decision Process.
We prove an upper bound on the number of calls to the generative models needed for MDP-GapE to identify a near-optimal action with high probability.
arXiv Detail & Related papers (2020-06-10T15:05:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.