Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet
- URL: http://arxiv.org/abs/2411.11451v1
- Date: Mon, 18 Nov 2024 10:34:14 GMT
- Title: Robust Markov Decision Processes: A Place Where AI and Formal Methods Meet
- Authors: Marnix Suilen, Thom Badings, Eline M. Bovy, David Parker, Nils Jansen,
- Abstract summary: Markov decision processes (MDPs) are a standard model for sequential decision-making problems.
They are widely used across many scientific areas, including formal methods and artificial intelligence (AI)
- Score: 12.056104044376372
- License:
- Abstract: Markov decision processes (MDPs) are a standard model for sequential decision-making problems and are widely used across many scientific areas, including formal methods and artificial intelligence (AI). MDPs do, however, come with the restrictive assumption that the transition probabilities need to be precisely known. Robust MDPs (RMDPs) overcome this assumption by instead defining the transition probabilities to belong to some uncertainty set. We present a gentle survey on RMDPs, providing a tutorial covering their fundamentals. In particular, we discuss RMDP semantics and how to solve them by extending standard MDP methods such as value iteration and policy iteration. We also discuss how RMDPs relate to other models and how they are used in several contexts, including reinforcement learning and abstraction techniques. We conclude with some challenges for future work on RMDPs.
Related papers
- Enhancing Multi-Step Reasoning Abilities of Language Models through Direct Q-Function Optimization [50.485788083202124]
Reinforcement Learning (RL) plays a crucial role in aligning large language models with human preferences and improving their ability to perform complex tasks.
We introduce Direct Q-function Optimization (DQO), which formulates the response generation process as a Markov Decision Process (MDP) and utilizes the soft actor-critic (SAC) framework to optimize a Q-function directly parameterized by the language model.
Experimental results on two math problem-solving datasets, GSM8K and MATH, demonstrate that DQO outperforms previous methods, establishing it as a promising offline reinforcement learning approach for aligning language models.
arXiv Detail & Related papers (2024-10-11T23:29:20Z) - What Are the Odds? Improving the foundations of Statistical Model Checking [3.789219860006095]
Markov decision processes (MDPs) are a fundamental model for decision making under uncertainty.
Traditionally verification algorithms assume exact knowledge of the probabilities that govern the behaviour of an MDP.
We propose specialised approaches that exploit our knowledge of the MDP.
arXiv Detail & Related papers (2024-04-08T11:47:46Z) - Multi-Objective Policy Gradients with Topological Constraints [108.10241442630289]
We present a new algorithm for a policy gradient in TMDPs by a simple extension of the proximal policy optimization (PPO) algorithm.
We demonstrate this on a real-world multiple-objective navigation problem with an arbitrary ordering of objectives both in simulation and on a real robot.
arXiv Detail & Related papers (2022-09-15T07:22:58Z) - Robust Anytime Learning of Markov Decision Processes [8.799182983019557]
In data-driven applications, deriving precise probabilities from limited data introduces statistical errors.
Uncertain MDPs (uMDPs) do not require precise probabilities but instead use so-called uncertainty sets in the transitions.
We propose a robust anytime-learning approach that combines a dedicated Bayesian inference scheme with the computation of robust policies.
arXiv Detail & Related papers (2022-05-31T14:29:55Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Semi-Markov Offline Reinforcement Learning for Healthcare [57.15307499843254]
We introduce three offline RL algorithms, namely, SDQN, SDDQN, and SBCQ.
We experimentally demonstrate that only these algorithms learn the optimal policy in variable-time environments.
We apply our new algorithms to a real-world offline dataset pertaining to warfarin dosing for stroke prevention.
arXiv Detail & Related papers (2022-03-17T14:51:21Z) - Robust Entropy-regularized Markov Decision Processes [23.719568076996662]
We study a robust version of the ER-MDP model, where the optimal policies are required to be robust.
We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings.
We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy.
arXiv Detail & Related papers (2021-12-31T09:50:46Z) - On Solving a Stochastic Shortest-Path Markov Decision Process as
Probabilistic Inference [5.517104116168873]
We propose solving the general Decision Shortest-Path Markov Process (SSP MDP) as probabilistic inference.
We discuss online and offline methods for planning under uncertainty.
arXiv Detail & Related papers (2021-09-13T11:07:52Z) - Safe Exploration by Solving Early Terminated MDP [77.10563395197045]
We introduce a new approach to address safe RL problems under the framework of Early TerminatedP (ET-MDP)
We first define the ET-MDP as an unconstrained algorithm with the same optimal value function as its corresponding CMDP.
An off-policy algorithm based on context models is then proposed to solve the ET-MDP, which thereby solves the corresponding CMDP with better performance and improved learning efficiency.
arXiv Detail & Related papers (2021-07-09T04:24:40Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.