Related papers: Solving Robust Markov Decision Processes: Generic, Reliable, Efficient

Solving Robust Markov Decision Processes: Generic, Reliable, Efficient

URL: http://arxiv.org/abs/2412.10185v1
Date: Fri, 13 Dec 2024 14:55:48 GMT
Title: Solving Robust Markov Decision Processes: Generic, Reliable, Efficient
Authors: Tobias Meggendorfer, Maximilian Weininger, Patrick Wienhöft,
Abstract summary: Markov decision processes (MDP) are a well-established model for sequential decision-making in the presence of probabilities.<n>We provide a framework for solving RMDPs that is generic, reliable, and efficient.<n>Our prototype implementation outperforms existing tools by several orders of magnitude.
Score: 3.789219860006095
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Markov decision processes (MDP) are a well-established model for sequential decision-making in the presence of probabilities. In robust MDP (RMDP), every action is associated with an uncertainty set of probability distributions, modelling that transition probabilities are not known precisely. Based on the known theoretical connection to stochastic games, we provide a framework for solving RMDPs that is generic, reliable, and efficient. It is *generic* both with respect to the model, allowing for a wide range of uncertainty sets, including but not limited to intervals, $L^1$- or $L^2$-balls, and polytopes; and with respect to the objective, including long-run average reward, undiscounted total reward, and stochastic shortest path. It is *reliable*, as our approach not only converges in the limit, but provides precision guarantees at any time during the computation. It is *efficient* because -- in contrast to state-of-the-art approaches -- it avoids explicitly constructing the underlying stochastic game. Consequently, our prototype implementation outperforms existing tools by several orders of magnitude and can solve RMDPs with a million states in under a minute.

Related papers

Robust Counterfactual Inference in Markov Decision Processes [1.5197843979051473]
Current approaches assume a specific causal model to make counterfactuals identifiable. We propose a novel non-parametric approach that computes tight bounds on counterfactual transition probabilities.
arXiv Detail & Related papers (2025-02-19T13:56:20Z)
Sample-efficient Learning of Infinite-horizon Average-reward MDPs with General Function Approximation [53.17668583030862]
We study infinite-horizon average-reward Markov decision processes (AMDPs) in the context of general function approximation. We propose a novel algorithmic framework named Local-fitted Optimization with OPtimism (LOOP) We show that LOOP achieves a sublinear $tildemathcalO(mathrmpoly(d, mathrmsp(V*)) sqrtTbeta )$ regret, where $d$ and $beta$ correspond to AGEC and log-covering number of the hypothesis class respectively
arXiv Detail & Related papers (2024-04-19T06:24:22Z)
The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model [61.87673435273466]
This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice. We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimize the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP.
arXiv Detail & Related papers (2023-05-26T02:32:03Z)
Variance-Dependent Regret Bounds for Linear Bandits and Reinforcement Learning: Adaptivity and Computational Efficiency [90.40062452292091]
We present the first computationally efficient algorithm for linear bandits with heteroscedastic noise. Our algorithm is adaptive to the unknown variance of noise and achieves an $tildeO(d sqrtsum_k = 1K sigma_k2 + d)$ regret. We also propose a variance-adaptive algorithm for linear mixture Markov decision processes (MDPs) in reinforcement learning.
arXiv Detail & Related papers (2023-02-21T00:17:24Z)
Robust Markov Decision Processes without Model Estimation [32.16801929347098]
There are two major barriers to applying robust MDPs in practice. First, most works study robust MDPs in a model-based regime. Second, prior work typically assumes a strong oracle to obtain the optimal solution.
arXiv Detail & Related papers (2023-02-02T17:29:10Z)
B$^3$RTDP: A Belief Branch and Bound Real-Time Dynamic Programming Approach to Solving POMDPs [17.956744635160568]
We propose an extension to the RTDP-Bel algorithm which we call Belief Branch and Bound RTDP (B$3$RTDP) Our algorithm uses a bounded value function representation and takes advantage of this in two novel ways. We empirically demonstrate that B$3$RTDP can achieve greater returns in less time than the state-of-the-art SARSOP solver on known POMDP problems.
arXiv Detail & Related papers (2022-10-22T21:42:59Z)
PAC Statistical Model Checking of Mean Payoff in Discrete- and Continuous-Time MDP [0.34410212782758043]
We provide the first algorithm to compute mean payoff probably approximately correctly in unknown MDP. We do not require any knowledge of the state space, only a lower bound on the minimum transition probability. In addition to providing probably approximately correct (PAC) bounds for our algorithm, we also demonstrate its practical nature by running experiments on standard benchmarks.
arXiv Detail & Related papers (2022-06-03T09:13:27Z)
Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds. We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z)
Sampling-Based Robust Control of Autonomous Systems with Non-Gaussian Noise [59.47042225257565]
We present a novel planning method that does not rely on any explicit representation of the noise distributions. First, we abstract the continuous system into a discrete-state model that captures noise by probabilistic transitions between states. We capture these bounds in the transition probability intervals of a so-called interval Markov decision process (iMDP)
arXiv Detail & Related papers (2021-10-25T06:18:55Z)
Reward is enough for convex MDPs [30.478950691312715]
We study convex MDPs in which goals are expressed as convex functions of the stationary distribution. We propose a meta-algorithm for solving this problem and show that it unifies many existing algorithms in the literature.
arXiv Detail & Related papers (2021-06-01T17:46:25Z)
Efficient semidefinite-programming-based inference for binary and multi-class MRFs [83.09715052229782]
We propose an efficient method for computing the partition function or MAP estimate in a pairwise MRF. We extend semidefinite relaxations from the typical binary MRF to the full multi-class setting, and develop a compact semidefinite relaxation that can again be solved efficiently using the solver.
arXiv Detail & Related papers (2020-12-04T15:36:29Z)
Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model [50.38446482252857]
This paper is concerned with the sample efficiency of reinforcement learning, assuming access to a generative model (or simulator) We first consider $gamma$-discounted infinite-horizon Markov decision processes (MDPs) with state space $mathcalS$ and action space $mathcalA$. We prove that a plain model-based planning algorithm suffices to achieve minimax-optimal sample complexity given any target accuracy level.
arXiv Detail & Related papers (2020-05-26T17:53:18Z)
Scalable First-Order Methods for Robust MDPs [31.29341288414507]
This paper proposes the first first-order framework for solving robust MDPs. By carefully controlling the tradeoff between the accuracy and cost of Value Iteration updates, we achieve an ergodic convergence rate of $O left( A2 S3log(S)log(epsilon-1) epsilon-1 right)$
arXiv Detail & Related papers (2020-05-11T21:06:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.