Related papers: PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP

PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP

URL: http://arxiv.org/abs/2206.01465v1
Date: Fri, 3 Jun 2022 09:13:27 GMT
Title: PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP
Authors: Chaitanya Agarwal, Shibashis Guha, Jan K\v{r}et\'insk\'y, M. Pazhamalai
Abstract summary: We provide the first algorithm to compute mean payoff probably approximately correctly in unknown MDP. We do not require any knowledge of the state space, only a lower bound on the minimum transition probability. In addition to providing probably approximately correct (PAC) bounds for our algorithm, we also demonstrate its practical nature by running experiments on standard benchmarks.
Score: 0.34410212782758043
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Markov decision processes (MDP) and continuous-time MDP (CTMDP) are the fundamental models for non-deterministic systems with probabilistic uncertainty. Mean payoff (a.k.a. long-run average reward) is one of the most classic objectives considered in their context. We provide the first algorithm to compute mean payoff probably approximately correctly in unknown MDP; further, we extend it to unknown CTMDP. We do not require any knowledge of the state space, only a lower bound on the minimum transition probability, which has been advocated in literature. In addition to providing probably approximately correct (PAC) bounds for our algorithm, we also demonstrate its practical nature by running experiments on standard benchmarks.

Related papers

Solving Robust Markov Decision Processes: Generic, Reliable, Efficient [3.789219860006095]
Markov decision processes (MDP) are a well-established model for sequential decision-making in the presence of probabilities. We provide a framework for solving RMDPs that is generic, reliable, and efficient. Our prototype implementation outperforms existing tools by several orders of magnitude.
arXiv Detail & Related papers (2024-12-13T14:55:48Z)
Near-Optimal Learning and Planning in Separated Latent MDPs [70.88315649628251]
We study computational and statistical aspects of learning Latent Markov Decision Processes (LMDPs) In this model, the learner interacts with an MDP drawn at the beginning of each epoch from an unknown mixture of MDPs.
arXiv Detail & Related papers (2024-06-12T06:41:47Z)
What Are the Odds? Improving the foundations of Statistical Model Checking [3.789219860006095]
Markov decision processes (MDPs) are a fundamental model for decision making under uncertainty. Traditionally verification algorithms assume exact knowledge of the probabilities that govern the behaviour of an MDP. We propose specialised approaches that exploit our knowledge of the MDP.
arXiv Detail & Related papers (2024-04-08T11:47:46Z)
Learning Algorithms for Verification of Markov Decision Processes [20.5951492453299]
We present a general framework for applying learning algorithms to the verification of Markov decision processes (MDPs) The presented framework focuses on probabilistic reachability, which is a core problem in verification.
arXiv Detail & Related papers (2024-03-14T08:54:19Z)
On the Global Convergence of Policy Gradient in Average Reward Markov Decision Processes [50.68789924454235]
We present the first finite time global convergence analysis of policy gradient in the context of average reward Markov decision processes (MDPs) Our analysis shows that the policy gradient iterates converge to the optimal policy at a sublinear rate of $Oleft(frac1Tright),$ which translates to $Oleft(log(T)right)$ regret, where $T$ represents the number of iterations.
arXiv Detail & Related papers (2024-03-11T15:25:03Z)
Online POMDP Planning with Anytime Deterministic Guarantees [11.157761902108692]
Planning under uncertainty can be mathematically formalized using partially observable Markov decision processes (POMDPs) Finding an optimal plan for POMDPs can be computationally expensive and is feasible only for small tasks. We derive a deterministic relationship between a simplified solution that is easier to obtain and the theoretically optimal one.
arXiv Detail & Related papers (2023-10-03T04:40:38Z)
Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states. We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy. We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z)
A Fully Polynomial Time Approximation Scheme for Constrained MDPs and Stochastic Shortest Path under Local Transitions [2.512827436728378]
We study the structure of (C)C-MDP, particularly an important variant that involves local transition. In this work, we propose a fully-time approximation scheme for (C)C-MDP that computes (near) optimal deterministic policies.
arXiv Detail & Related papers (2022-04-10T22:08:33Z)
Under-Approximating Expected Total Rewards in POMDPs [68.8204255655161]
We consider the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well.
arXiv Detail & Related papers (2022-01-21T16:43:03Z)
Safe Exploration by Solving Early Terminated MDP [77.10563395197045]
We introduce a new approach to address safe RL problems under the framework of Early TerminatedP (ET-MDP) We first define the ET-MDP as an unconstrained algorithm with the same optimal value function as its corresponding CMDP. An off-policy algorithm based on context models is then proposed to solve the ET-MDP, which thereby solves the corresponding CMDP with better performance and improved learning efficiency.
arXiv Detail & Related papers (2021-07-09T04:24:40Z)
RL for Latent MDPs: Regret Guarantees and a Lower Bound [74.41782017817808]
We consider the regret problem for reinforcement learning in latent Markov Decision Processes (LMDP) In an LMDP, an MDP is randomly drawn from a set of $M$ possible MDPs at the beginning of the interaction, but the identity of the chosen MDP is not revealed to the agent. We show that the key link is a notion of separation between the MDP system dynamics.
arXiv Detail & Related papers (2021-02-09T16:49:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.