Stopping Criteria for Value Iteration on Stochastic Games with
Quantitative Objectives
- URL: http://arxiv.org/abs/2304.09930v1
- Date: Wed, 19 Apr 2023 19:09:55 GMT
- Title: Stopping Criteria for Value Iteration on Stochastic Games with
Quantitative Objectives
- Authors: Jan K\v{r}et\'insk\'y, Tobias Meggendorfer, Maximilian Weininger
- Abstract summary: A classic solution technique for Markov decision processes (MDP) and games (SG) is value (VI)
In this paper, we provide the first stopping criteria for VI on SG with total reward and mean payoff, yielding the first anytime algorithms in these settings.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A classic solution technique for Markov decision processes (MDP) and
stochastic games (SG) is value iteration (VI). Due to its good practical
performance, this approximative approach is typically preferred over exact
techniques, even though no practical bounds on the imprecision of the result
could be given until recently. As a consequence, even the most used model
checkers could return arbitrarily wrong results. Over the past decade,
different works derived stopping criteria, indicating when the precision
reaches the desired level, for various settings, in particular MDP with
reachability, total reward, and mean payoff, and SG with reachability.
In this paper, we provide the first stopping criteria for VI on SG with total
reward and mean payoff, yielding the first anytime algorithms in these
settings. To this end, we provide the solution in two flavours: First through a
reduction to the MDP case and second directly on SG. The former is simpler and
automatically utilizes any advances on MDP. The latter allows for more local
computations, heading towards better practical efficiency.
Our solution unifies the previously mentioned approaches for MDP and SG and
their underlying ideas. To achieve this, we isolate objective-specific
subroutines as well as identify objective-independent concepts. These
structural concepts, while surprisingly simple, form the very essence of the
unified solution.
Related papers
- Learning Algorithms for Verification of Markov Decision Processes [20.5951492453299]
We present a general framework for applying learning algorithms to the verification of Markov decision processes (MDPs)
The presented framework focuses on probabilistic reachability, which is a core problem in verification.
arXiv Detail & Related papers (2024-03-14T08:54:19Z) - Rethinking Model Selection and Decoding for Keyphrase Generation with
Pre-trained Sequence-to-Sequence Models [76.52997424694767]
Keyphrase Generation (KPG) is a longstanding task in NLP with widespread applications.
Seq2seq pre-trained language models (PLMs) have ushered in a transformative era for KPG, yielding promising performance improvements.
This paper undertakes a systematic analysis of the influence of model selection and decoding strategies on PLM-based KPG.
arXiv Detail & Related papers (2023-10-10T07:34:45Z) - Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both
Worlds in Stochastic and Deterministic Environments [48.96971760679639]
We study variance-dependent regret bounds for Markov decision processes (MDPs)
We propose two new environment norms to characterize the fine-grained variance properties of the environment.
For model-based methods, we design a variant of the MVP algorithm.
In particular, this bound is simultaneously minimax optimal for both and deterministic MDPs.
arXiv Detail & Related papers (2023-01-31T06:54:06Z) - PAC Statistical Model Checking of Mean Payoff in Discrete- and
Continuous-Time MDP [0.34410212782758043]
We provide the first algorithm to compute mean payoff probably approximately correctly in unknown MDP.
We do not require any knowledge of the state space, only a lower bound on the minimum transition probability.
In addition to providing probably approximately correct (PAC) bounds for our algorithm, we also demonstrate its practical nature by running experiments on standard benchmarks.
arXiv Detail & Related papers (2022-06-03T09:13:27Z) - Under-Approximating Expected Total Rewards in POMDPs [68.8204255655161]
We consider the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP)
We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well.
arXiv Detail & Related papers (2022-01-21T16:43:03Z) - Finite-Time Complexity of Online Primal-Dual Natural Actor-Critic Algorithm for Constrained Markov Decision Processes [13.908826484332282]
We study an online primal-dual actor-critic method to solve a discounted cost constrained Markov decision process problem.
This paper is the first to study the finite-time complexity of an online primal-dual actor-critic method for solving a CMDP problem.
arXiv Detail & Related papers (2021-10-21T18:05:40Z) - Rethinking Counting and Localization in Crowds:A Purely Point-Based
Framework [59.578339075658995]
We propose a purely point-based framework for joint crowd counting and individual localization.
We design an intuitive solution under this framework, which is called Point to Point Network (P2PNet)
arXiv Detail & Related papers (2021-07-27T11:41:50Z) - A Retrospective Approximation Approach for Smooth Stochastic
Optimization [0.2867517731896504]
Gradient (SG) is the defactorandom iterative technique to solve optimization (SO) problems with a smooth (non-fimation) objective $imation.
arXiv Detail & Related papers (2021-03-07T16:29:36Z) - Parallel Stochastic Mirror Descent for MDPs [72.75921150912556]
We consider the problem of learning the optimal policy for infinite-horizon Markov decision processes (MDPs)
Some variant of Mirror Descent is proposed for convex programming problems with Lipschitz-continuous functionals.
We analyze this algorithm in a general case and obtain an estimate of the convergence rate that does not accumulate errors during the operation of the method.
arXiv Detail & Related papers (2021-02-27T19:28:39Z) - Adaptive Sampling for Best Policy Identification in Markov Decision
Processes [79.4957965474334]
We investigate the problem of best-policy identification in discounted Markov Decision (MDPs) when the learner has access to a generative model.
The advantages of state-of-the-art algorithms are discussed and illustrated.
arXiv Detail & Related papers (2020-09-28T15:22:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.