Approximating Euclidean by Imprecise Markov Decision Processes
- URL: http://arxiv.org/abs/2006.14923v1
- Date: Fri, 26 Jun 2020 11:58:04 GMT
- Title: Approximating Euclidean by Imprecise Markov Decision Processes
- Authors: Manfred Jaeger, Giorgio Bacci, Giovanni Bacci, Kim Guldstrand Larsen,
and Peter Gj{\o}l Jensen
- Abstract summary: We investigate what kind of approximation guarantees are obtained when the Euclidean process is approximated by finite state approximations.
We show that for cost functions over finite time horizons the approximations become arbitrarily precise.
- Score: 3.0017241250121383
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Euclidean Markov decision processes are a powerful tool for modeling control
problems under uncertainty over continuous domains. Finite state imprecise,
Markov decision processes can be used to approximate the behavior of these
infinite models. In this paper we address two questions: first, we investigate
what kind of approximation guarantees are obtained when the Euclidean process
is approximated by finite state approximations induced by increasingly fine
partitions of the continuous state space. We show that for cost functions over
finite time horizons the approximations become arbitrarily precise. Second, we
use imprecise Markov decision process approximations as a tool to analyse and
validate cost functions and strategies obtained by reinforcement learning. We
find that, on the one hand, our new theoretical results validate basic design
choices of a previously proposed reinforcement learning approach. On the other
hand, the imprecise Markov decision process approximations reveal some
inaccuracies in the learned cost functions.
Related papers
- Trust-Region Sequential Quadratic Programming for Stochastic Optimization with Random Models [57.52124921268249]
We propose a Trust Sequential Quadratic Programming method to find both first and second-order stationary points.
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a approximation of the objective subject.
To converge to second-order stationary points, our method additionally computes an eigen step to explore the negative curvature the reduced Hessian matrix.
arXiv Detail & Related papers (2024-09-24T04:39:47Z) - Sound Heuristic Search Value Iteration for Undiscounted POMDPs with Reachability Objectives [16.101435842520473]
This paper studies the challenging yet important problem in POMDPs known as the (indefinite-horizon) Maximal Reachability Probability Problem.
Inspired by the success of point-based methods developed for discounted problems, we study their extensions to MRPP.
We present a novel algorithm that leverages the strengths of these techniques for efficient exploration of the belief space.
arXiv Detail & Related papers (2024-06-05T02:33:50Z) - Markov Decision Processes with Noisy State Observation [0.0]
This paper addresses the challenge of a particular class of noisy state observations in Markov Decision Processes (MDPs)
We focus on modeling this uncertainty through a confusion matrix that captures the probabilities of misidentifying the true state.
We propose two novel algorithmic approaches to estimate the inherent measurement noise.
arXiv Detail & Related papers (2023-12-13T21:50:38Z) - Online POMDP Planning with Anytime Deterministic Guarantees [11.157761902108692]
Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs)
Finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks.
We derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one.
arXiv Detail & Related papers (2023-10-03T04:40:38Z) - Robust $Q$-learning Algorithm for Markov Decision Processes under Wasserstein Uncertainty [5.639904484784127]
We present a novel $Q$-learning algorithm tailored to solve distributionally robust Markov decision problems.
We prove convergence of the presented algorithm as well as the benefits of considering distributional robustness when solving optimal control problems.
arXiv Detail & Related papers (2022-09-30T10:01:04Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Markov Abstractions for PAC Reinforcement Learning in Non-Markov
Decision Processes [90.53326983143644]
We show that Markov abstractions can be learned during reinforcement learning.
We show that our approach has PAC guarantees when the employed algorithms have PAC guarantees.
arXiv Detail & Related papers (2022-04-29T16:53:00Z) - Counterfactual Explanations in Sequential Decision Making Under
Uncertainty [27.763369810430653]
We develop methods to find counterfactual explanations for sequential decision making processes.
In our problem formulation, the counterfactual explanation specifies an alternative sequence of actions differing in at most k actions.
We show that our algorithm finds can provide valuable insights to enhance decision making under uncertainty.
arXiv Detail & Related papers (2021-07-06T17:38:19Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Robust, Accurate Stochastic Optimization for Variational Inference [68.83746081733464]
We show that common optimization methods lead to poor variational approximations if the problem is moderately large.
Motivated by these findings, we develop a more robust and accurate optimization framework by viewing the underlying algorithm as producing a Markov chain.
arXiv Detail & Related papers (2020-09-01T19:12:11Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.