Related papers: On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference

On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference

URL: http://arxiv.org/abs/2109.05866v1
Date: Mon, 13 Sep 2021 11:07:52 GMT
Title: On Solving a Stochastic Shortest-Path Markov Decision Process as Probabilistic Inference
Authors: Mohamed Baioumy, Bruno Lacerda, Paul Duckworth, Nick Hawes
Abstract summary: We propose solving the general Decision Shortest-Path Markov Process (SSP MDP) as probabilistic inference. We discuss online and offline methods for planning under uncertainty.
Score: 5.517104116168873
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Previous work on planning as active inference addresses finite horizon problems and solutions valid for online planning. We propose solving the general Stochastic Shortest-Path Markov Decision Process (SSP MDP) as probabilistic inference. Furthermore, we discuss online and offline methods for planning under uncertainty. In an SSP MDP, the horizon is indefinite and unknown a priori. SSP MDPs generalize finite and infinite horizon MDPs and are widely used in the artificial intelligence community. Additionally, we highlight some of the differences between solving an MDP using dynamic programming approaches widely used in the artificial intelligence community and approaches used in the active inference community.

Related papers

Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet [12.056104044376372]
Markov decision processes (MDPs) are a standard model for sequential decision-making problems. They are widely used across many scientific areas, including formal methods and artificial intelligence (AI)
arXiv Detail & Related papers (2024-11-18T10:34:14Z)
Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach [57.788675205519986]
We learn high-quality traces from POMDP executions generated by any solver. We exploit data- and time-efficient Indu Logic Programming (ILP) to generate interpretable belief-based policy specifications. We show that learneds expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specifics within lower computational time.
arXiv Detail & Related papers (2024-02-29T15:36:01Z)
Provably Efficient UCB-type Algorithms For Learning Predictive State Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs) This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models. In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z)
BetaZero: Belief-State Planning for Long-Horizon POMDPs using Learned Approximations [37.29355942795658]
Real-world planning problems have been modeled as partially observable Markov decision processes (POMDPs) and solved using approximate methods. To solve high-dimensional POMDPs in practice, state-of-the-art methods use online planning with problem-specifics to reduce planning horizons. We propose BetaZero, a belief-state planning algorithm for high-dimensional POMDPs.
arXiv Detail & Related papers (2023-05-31T23:47:31Z)
Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm. We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z)
Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z)
Model-Free Reinforcement Learning for Optimal Control of MarkovDecision Processes Under Signal Temporal Logic Specifications [7.842869080999489]
We present a model-free reinforcement learning algorithm to find an optimal policy for a finite-horizon Markov decision process. We illustrate the effectiveness of our approach in the context of robotic motion planning for complex missions under uncertainty and performance objectives.
arXiv Detail & Related papers (2021-09-27T22:44:55Z)
Efficient Sampling in POMDPs with Lipschitz Bandits for Motion Planning in Continuous Spaces [5.732271870257913]
Decision making under uncertainty can be framed as a partially observable Markov decision process (POMDP) Finding exact solutions of POMDPs is generally intractable, but the solution can be approximated by sampling-based approaches. We demonstrate the effectiveness of this approach in the context of motion planning for automated driving.
arXiv Detail & Related papers (2021-06-08T09:31:48Z)
Learning MDPs from Features: Predict-Then-Optimize for Sequential Decision Problems by Reinforcement Learning [52.74071439183113]
We study the predict-then-optimize framework in the context of sequential decision problems (formulated as MDPs) solved via reinforcement learning. Two significant computational challenges arise in applying decision-focused learning to MDPs.
arXiv Detail & Related papers (2021-06-06T23:53:31Z)
Identification of Unexpected Decisions in Partially Observable Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces. The proposed method explores local properties of policy behavior to identify unexpected decisions. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z)

This list is automatically generated from the titles and abstracts of the papers in this site.