Related papers: Robust Anytime Learning of Markov Decision Processes

Robust Anytime Learning of Markov Decision Processes

URL: http://arxiv.org/abs/2205.15827v4
Date: Mon, 19 Jun 2023 09:07:48 GMT
Title: Robust Anytime Learning of Markov Decision Processes
Authors: Marnix Suilen, Thiago D. Sim\~ao, David Parker, Nils Jansen
Abstract summary: In data-driven applications, deriving precise probabilities from limited data introduces statistical errors. Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions. We propose a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies.
Score: 8.799182983019557
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Markov decision processes (MDPs) are formal models commonly used in sequential decision-making. MDPs capture the stochasticity that may arise, for instance, from imprecise actuators via probabilities in the transition function. However, in data-driven applications, deriving precise probabilities from (limited) data introduces statistical errors that may lead to unexpected or undesirable outcomes. Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions, accounting for such limited data. Tools from the formal verification community efficiently compute robust policies that provably adhere to formal specifications, like safety constraints, under the worst-case instance in the uncertainty set. We continuously learn the transition probabilities of an MDP in a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies. In particular, our method (1) approximates probabilities as intervals, (2) adapts to new data that may be inconsistent with an intermediate model, and (3) may be stopped at any time to compute a robust policy on the uMDP that faithfully captures the data so far. Furthermore, our method is capable of adapting to changes in the environment. We show the effectiveness of our approach and compare it to robust policies computed on uMDPs learned by the UCRL2 reinforcement learning algorithm in an experimental evaluation on several benchmarks.

Related papers

Efficient Solution and Learning of Robust Factored MDPs [57.2416302384766]
Learning r-MDPs from interactions with an unknown environment enables the synthesis of robust policies with provable guarantees on performance.<n>We propose novel methods for solving and learning r-MDPs based on factored state representations.
arXiv Detail & Related papers (2025-08-01T15:23:15Z)
Robust Counterfactual Inference in Markov Decision Processes [1.5197843979051473]
Current approaches assume a specific causal model to make counterfactuals identifiable. We propose a novel non-parametric approach that computes tight bounds on counterfactual transition probabilities.
arXiv Detail & Related papers (2025-02-19T13:56:20Z)
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation [62.2436697657307]
Prediction-powered inference (PPI) is a method that improves statistical estimates based on limited human-labeled data. We propose a method called Stratified Prediction-Powered Inference (StratPPI) We show that the basic PPI estimates can be considerably improved by employing simple data stratification strategies.
arXiv Detail & Related papers (2024-06-06T17:37:39Z)
What Are the Odds? Improving the foundations of Statistical Model Checking [3.789219860006095]
Markov decision processes (MDPs) are a fundamental model for decision making under uncertainty. Traditionally verification algorithms assume exact knowledge of the probabilities that govern the behaviour of an MDP. We propose specialised approaches that exploit our knowledge of the MDP.
arXiv Detail & Related papers (2024-04-08T11:47:46Z)
Efficient and Sharp Off-Policy Evaluation in Robust Markov Decision Processes [44.974100402600165]
We study the evaluation of a policy best-parametric and worst-case perturbations to a decision process (MDP) We use transition observations from the original MDP, whether they are generated under the same or a different policy. Our estimator is also estimated statistical inference using Wald confidence intervals.
arXiv Detail & Related papers (2024-03-29T18:11:49Z)
Efficient Conformal Prediction under Data Heterogeneity [79.35418041861327]
Conformal Prediction (CP) stands out as a robust framework for uncertainty quantification. Existing approaches for tackling non-exchangeability lead to methods that are not computable beyond the simplest examples. This work introduces a new efficient approach to CP that produces provably valid confidence sets for fairly general non-exchangeable data distributions.
arXiv Detail & Related papers (2023-12-25T20:02:51Z)
Policy Gradient for Rectangular Robust Markov Decision Processes [62.397882389472564]
We introduce robust policy gradient (RPG), a policy-based method that efficiently solves rectangular robust Markov decision processes (MDPs) Our resulting RPG can be estimated from data with the same time complexity as its non-robust equivalent.
arXiv Detail & Related papers (2023-01-31T12:40:50Z)
Robust Control for Dynamical Systems With Non-Gaussian Noise via Formal Abstractions [59.605246463200736]
We present a novel controller synthesis method that does not rely on any explicit representation of the noise distributions. First, we abstract the continuous control system into a finite-state model that captures noise by probabilistic transitions between discrete states. We use state-of-the-art verification techniques to provide guarantees on the interval Markov decision process and compute a controller for which these guarantees carry over to the original control system.
arXiv Detail & Related papers (2023-01-04T10:40:30Z)
Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z)
Robust Entropy-regularized Markov Decision Processes [23.719568076996662]
We study a robust version of the ER-MDP model, where the optimal policies are required to be robust. We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings. We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy.
arXiv Detail & Related papers (2021-12-31T09:50:46Z)
Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian Noise [59.47042225257565]
We present a novel planning method that does not rely on any explicit representation of the noise distributions. First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP)
arXiv Detail & Related papers (2021-10-25T06:18:55Z)
Minimum-Delay Adaptation in Non-Stationary Reinforcement Learning via Online High-Confidence Change-Point Detection [7.685002911021767]
We introduce an algorithm that efficiently learns policies in non-stationary environments. It analyzes a possibly infinite stream of data and computes, in real-time, high-confidence change-point detection statistics. We show that (i) this algorithm minimizes the delay until unforeseen changes to a context are detected, thereby allowing for rapid responses.
arXiv Detail & Related papers (2021-05-20T01:57:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.