Omega-Regular Decision Processes
- URL: http://arxiv.org/abs/2312.08602v1
- Date: Thu, 14 Dec 2023 01:58:51 GMT
- Title: Omega-Regular Decision Processes
- Authors: Ernst Moritz Hahn, Mateo Perez, Sven Schewe, Fabio Somenzi, Ashutosh
Trivedi, Dominik Wojtczak
- Abstract summary: We introduce omega-regular decision processes (ODPs) where the non-Markovian aspect of the transition and reward functions are extended to an omega-regular lookahead.
We present experimental results demonstrating the effectiveness of the proposed reduction.
- Score: 11.917126383341593
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Regular decision processes (RDPs) are a subclass of non-Markovian decision
processes where the transition and reward functions are guarded by some regular
property of the past (a lookback). While RDPs enable intuitive and succinct
representation of non-Markovian decision processes, their expressive power
coincides with finite-state Markov decision processes (MDPs). We introduce
omega-regular decision processes (ODPs) where the non-Markovian aspect of the
transition and reward functions are extended to an omega-regular lookahead over
the system evolution. Semantically, these lookaheads can be considered as
promises made by the decision maker or the learning agent about her future
behavior. In particular, we assume that, if the promised lookaheads are not
met, then the payoff to the decision maker is $\bot$ (least desirable payoff),
overriding any rewards collected by the decision maker. We enable optimization
and learning for ODPs under the discounted-reward objective by reducing them to
lexicographic optimization and learning over finite MDPs. We present
experimental results demonstrating the effectiveness of the proposed reduction.
Related papers
- Anytime Incremental $ρ$POMDP Planning in Continuous Spaces [5.767643556541711]
We present an anytime solver that dynamically refines belief representations, with formal guarantees of improvement over time.
We demonstrate its effectiveness for common entropy estimators, reducing computational cost by orders of magnitude.
Experimental results show that $rho$POMCPOW outperforms state-of-the-art solvers in both efficiency and solution quality.
arXiv Detail & Related papers (2025-02-04T18:19:40Z) - Fair Resource Allocation in Weakly Coupled Markov Decision Processes [3.824858358548714]
We consider fair resource allocation in sequential decision-making environments modeled as weakly coupled Markov decision processes.
We adopt a fairness definition using the generalized Gini function instead of the traditional utilitarian (total-sum) objective.
arXiv Detail & Related papers (2024-11-14T20:40:55Z) - Beyond Average Return in Markov Decision Processes [49.157108194438635]
We prove that only generalized means can be optimized exactly, even in the more general framework of Distributional Reinforcement Learning (DistRL).
We provide error bounds on the resulting estimators, and discuss the potential of this approach as well as its limitations.
arXiv Detail & Related papers (2023-10-31T08:36:41Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Explainable Predictive Decision Mining for Operational Support [0.3232625980782302]
Decision mining aims to describe/predict the routing of a process instance at a decision point of the process.
Existing techniques for decision mining have focused largely on describing decisions but not on predicting them.
Our proposed approach provides explanations of the predicted decisions using SHAP values to support the elicitation of proactive actions.
arXiv Detail & Related papers (2022-10-30T09:27:41Z) - Efficient PAC Reinforcement Learning in Regular Decision Processes [99.02383154255833]
We study reinforcement learning in regular decision processes.
Our main contribution is to show that a near-optimal policy can be PAC-learned in time in a set of parameters.
arXiv Detail & Related papers (2021-05-14T12:08:46Z) - Modular Deep Reinforcement Learning for Continuous Motion Planning with
Temporal Logic [59.94347858883343]
This paper investigates the motion planning of autonomous dynamical systems modeled by Markov decision processes (MDP)
The novelty is to design an embedded product MDP (EP-MDP) between the LDGBA and the MDP.
The proposed LDGBA-based reward shaping and discounting schemes for the model-free reinforcement learning (RL) only depend on the EP-MDP states.
arXiv Detail & Related papers (2021-02-24T01:11:25Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Upper Confidence Primal-Dual Reinforcement Learning for CMDP with
Adversarial Loss [145.54544979467872]
We consider online learning for episodically constrained Markov decision processes (CMDPs)
We propose a new emphupper confidence primal-dual algorithm, which only requires the trajectories sampled from the transition model.
Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning.
arXiv Detail & Related papers (2020-03-02T05:02:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.