Robust Anytime Learning of Markov Decision Processes
- URL: http://arxiv.org/abs/2205.15827v4
- Date: Mon, 19 Jun 2023 09:07:48 GMT
- Title: Robust Anytime Learning of Markov Decision Processes
- Authors: Marnix Suilen, Thiago D. Sim\~ao, David Parker, Nils Jansen
- Abstract summary: In data-driven applications, deriving precise probabilities from limited data introduces statistical errors.
Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions.
We propose a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies.
- Score: 8.799182983019557
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Markov decision processes (MDPs) are formal models commonly used in
sequential decision-making. MDPs capture the stochasticity that may arise, for
instance, from imprecise actuators via probabilities in the transition
function. However, in data-driven applications, deriving precise probabilities
from (limited) data introduces statistical errors that may lead to unexpected
or undesirable outcomes. Uncertain MDPs (uMDPs) do not require precise
probabilities but instead use so-called uncertainty sets in the transitions,
accounting for such limited data. Tools from the formal verification community
efficiently compute robust policies that provably adhere to formal
specifications, like safety constraints, under the worst-case instance in the
uncertainty set. We continuously learn the transition probabilities of an MDP
in a robust anytime-learning approach that combines a dedicated Bayesian
inference scheme with the computation of robust policies. In particular, our
method (1) approximates probabilities as intervals, (2) adapts to new data that
may be inconsistent with an intermediate model, and (3) may be stopped at any
time to compute a robust policy on the uMDP that faithfully captures the data
so far. Furthermore, our method is capable of adapting to changes in the
environment. We show the effectiveness of our approach and compare it to robust
policies computed on uMDPs learned by the UCRL2 reinforcement learning
algorithm in an experimental evaluation on several benchmarks.
Related papers
- Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data.
We propose a method called Stratified Prediction-Powered Inference (StratPPI)
We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z) - What Are the Odds? Improving the foundations of Statistical Model Checking [3.789219860006095]
Markov decision processes (MDPs) are a fundamental model for decision making under uncertainty.
Traditionally verification algorithms assume exact knowledge of the probabilities that govern the behaviour of an MDP.
We propose specialised approaches that exploit our knowledge of the MDP.
arXiv Detail & Related papers (2024-04-08T11:47:46Z) - Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes [44.974100402600165]
We study the evaluation of a policy best-parametric and worst-case perturbations to a decision process (MDP)
We use transition observations from the original MDP, whether they are generated under the same or a different policy.
Our estimator is also estimated statistical inference using Wald confidence intervals.
arXiv Detail & Related papers (2024-03-29T18:11:49Z) - Efficient Conformal Prediction under Data Heterogeneity [79.35418041861327]
Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification.
Existing approaches for tackling non-exchangeability lead to methods that are not computable beyond the simplest examples.
This work introduces a new efficient approach to CP that produces provably valid confidence sets for fairly general non-exchangeable data distributions.
arXiv Detail & Related papers (2023-12-25T20:02:51Z) - Policy Gradient for Rectangular Robust Markov Decision Processes [62.397882389472564]
We introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (MDPs)
Our resulting RPG can be estimated from data with the same time complexity as its non-robust equivalent.
arXiv Detail & Related papers (2023-01-31T12:40:50Z) - Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal
Abstractions [59.605246463200736]
We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions.
First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states.
We use state-of-the-art verification techniques to provide guarantees on the interval Markov decision process and compute a controller for which these guarantees carry over to the original control system.
arXiv Detail & Related papers (2023-01-04T10:40:30Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Robust Entropy-regularized Markov Decision Processes [23.719568076996662]
We study a robust version of the ER-MDP model, where the optimal policies are required to be robust.
We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings.
We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy.
arXiv Detail & Related papers (2021-12-31T09:50:46Z) - Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian
Noise [59.47042225257565]
We present a novel planning method that does not rely on any explicit representation of the noise distributions.
First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states.
We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP)
arXiv Detail & Related papers (2021-10-25T06:18:55Z) - Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via
Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments.
It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics.
We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.