Interval Markov Decision Processes with Continuous Action-Spaces
- URL: http://arxiv.org/abs/2211.01231v2
- Date: Fri, 7 Apr 2023 09:02:53 GMT
- Title: Interval Markov Decision Processes with Continuous Action-Spaces
- Authors: Giannis Delimpaltadakis, Morteza Lahijanian, Manuel Mazo Jr., Luca
Laurenti
- Abstract summary: We introduce continuous-action IMDPs (caIMDPs), where the bounds on transition probabilities are functions of the action variables.
We exploit the simple form of max problems to identify cases where value over caIMDPs can be solved efficiently.
We demonstrate our results on a numerical example.
- Score: 6.088695984060244
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Interval Markov Decision Processes (IMDPs) are finite-state uncertain Markov
models, where the transition probabilities belong to intervals. Recently, there
has been a surge of research on employing IMDPs as abstractions of stochastic
systems for control synthesis. However, due to the absence of algorithms for
synthesis over IMDPs with continuous action-spaces, the action-space is assumed
discrete a-priori, which is a restrictive assumption for many applications.
Motivated by this, we introduce continuous-action IMDPs (caIMDPs), where the
bounds on transition probabilities are functions of the action variables, and
study value iteration for maximizing expected cumulative rewards. Specifically,
we decompose the max-min problem associated to value iteration to
$|\mathcal{Q}|$ max problems, where $|\mathcal{Q}|$ is the number of states of
the caIMDP. Then, exploiting the simple form of these max problems, we identify
cases where value iteration over caIMDPs can be solved efficiently (e.g., with
linear or convex programming). We also gain other interesting insights: e.g.,
in certain cases where the action set $\mathcal{A}$ is a polytope, synthesis
over a discrete-action IMDP, where the actions are the vertices of
$\mathcal{A}$, is sufficient for optimality. We demonstrate our results on a
numerical example. Finally, we include a short discussion on employing caIMDPs
as abstractions for control synthesis.
Related papers
- Projection by Convolution: Optimal Sample Complexity for Reinforcement Learning in Continuous-Space MDPs [56.237917407785545]
We consider the problem of learning an $varepsilon$-optimal policy in a general class of continuous-space Markov decision processes (MDPs) having smooth Bellman operators.
Key to our solution is a novel projection technique based on ideas from harmonic analysis.
Our result bridges the gap between two popular but conflicting perspectives on continuous-space MDPs.
arXiv Detail & Related papers (2024-05-10T09:58:47Z) - Prospective Side Information for Latent MDPs [80.00842638151558]
We study the class of LMDPs with em prospective side information, when an agent receives additional, weakly revealing, information at the beginning of each episode.
We show that, surprisingly, this problem is not captured by contemporary settings and algorithms designed for partially observed environments.
We then establish that any sample efficient algorithm must suffer at least $Omega(K2/3)$-regret, as opposed to standard $Omega(sqrtK)$ lower bounds, and design an algorithm with a matching upper bound.
arXiv Detail & Related papers (2023-10-11T15:37:31Z) - Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space [0.0]
We study the problem of optimal control of a family of discrete-time countable state-space Markov Decision Processes.
We propose an algorithm based on Thompson sampling with dynamically-sized episodes.
We show that our algorithm can be applied to develop approximately optimal control algorithms.
arXiv Detail & Related papers (2023-06-05T03:57:16Z) - Numerical Methods for Convex Multistage Stochastic Optimization [86.45244607927732]
We focus on optimisation programming (SP), Optimal Control (SOC) and Decision Processes (MDP)
Recent progress in solving convex multistage Markov problems is based on cutting planes approximations of the cost-to-go functions of dynamic programming equations.
Cutting plane type methods can handle multistage problems with a large number of stages, but a relatively smaller number of state (decision) variables.
arXiv Detail & Related papers (2023-03-28T01:30:40Z) - Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation
Constrained Optimization [88.0031283949404]
Many real-world problems have complicated non functional constraints and use a large number of data points.
Our proposed method outperforms an existing method with the previously best-known result.
arXiv Detail & Related papers (2022-12-19T14:48:54Z) - Convex Optimization for Parameter Synthesis in MDPs [19.808494349302784]
Probabilistic model checking aims to prove whether a Markov decision process satisfies a temporal logic specification.
We develop two approaches that iteratively obtain locally optimal runtime solutions.
We demonstrate the approaches on a satellite parameter synthesis problem with hundreds of thousands of parameters and their scalability on a wide range of commonly used benchmarks.
arXiv Detail & Related papers (2021-06-30T21:23:56Z) - A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs [117.82903457289584]
We derive a novel problem-dependent lower-bound for regret in finite-horizon Markov Decision Processes (MDPs)
We show that our lower-bound is considerably smaller than in the general case and it does not scale with the minimum action gap at all.
We show that this last result is attainable (up to $poly(H)$ terms, where $H$ is the horizon) by providing a regret upper-bound based on policy gaps for an optimistic algorithm.
arXiv Detail & Related papers (2021-06-24T13:46:09Z) - Navigating to the Best Policy in Markov Decision Processes [68.8204255655161]
We investigate the active pure exploration problem in Markov Decision Processes.
Agent sequentially selects actions and, from the resulting system trajectory, aims at the best as fast as possible.
arXiv Detail & Related papers (2021-06-05T09:16:28Z) - Weak SINDy For Partial Differential Equations [0.0]
We extend our Weak SINDy (WSINDy) framework to the setting of partial differential equations (PDEs)
The elimination of pointwise derivative approximations via the weak form enables effective machine-precision recovery of model coefficients from noise-free data.
We demonstrate WSINDy's robustness, speed and accuracy on several challenging PDEs.
arXiv Detail & Related papers (2020-07-06T16:03:51Z) - Provably Efficient Model-Free Algorithm for MDPs with Peak Constraints [38.2783003051101]
This paper considers the peak Constrained Markov Decision Process (PCMDP), where the agent chooses the policy to maximize total reward in the finite horizon as well as satisfy constraints at each epoch with probability 1.
We propose a model-free algorithm that converts PCMDP problem to an unconstrained problem and a Q-learning based approach is applied.
arXiv Detail & Related papers (2020-03-11T23:23:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.