Welfare Maximization Algorithm for Solving Budget-Constrained
Multi-Component POMDPs
- URL: http://arxiv.org/abs/2303.10302v2
- Date: Sun, 14 May 2023 14:21:51 GMT
- Title: Welfare Maximization Algorithm for Solving Budget-Constrained
Multi-Component POMDPs
- Authors: Manav Vora, Pranay Thangeda, Michael N. Grussing, Melkior Ornik
- Abstract summary: This paper presents an algorithm to find the optimal policy for a multi-component budget-constrained POMDP.
We show that the proposed algorithm vastly outperforms the policy currently used in practice.
- Score: 2.007262412327553
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Partially Observable Markov Decision Processes (POMDPs) provide an efficient
way to model real-world sequential decision making processes. Motivated by the
problem of maintenance and inspection of a group of infrastructure components
with independent dynamics, this paper presents an algorithm to find the optimal
policy for a multi-component budget-constrained POMDP. We first introduce a
budgeted-POMDP model (b-POMDP) which enables us to find the optimal policy for
a POMDP while adhering to budget constraints. Next, we prove that the value
function or maximal collected reward for a b-POMDP is a concave function of the
budget for the finite horizon case. Our second contribution is an algorithm to
calculate the optimal policy for a multi-component budget-constrained POMDP by
finding the optimal budget split among the individual component POMDPs. The
optimal budget split is posed as a welfare maximization problem and the
solution is computed by exploiting the concave nature of the value function. We
illustrate the effectiveness of the proposed algorithm by proposing a
maintenance and inspection policy for a group of real-world infrastructure
components with different deterioration dynamics, inspection and maintenance
costs. We show that the proposed algorithm vastly outperforms the policy
currently used in practice.
Related papers
- Capacity-Aware Planning and Scheduling in Budget-Constrained Monotonic MDPs: A Meta-RL Approach [7.385321178884467]
Many real-world sequential repair problems can be effectively modeled using monotonic Markov Decision Processes (MDPs)
This work addresses the problem of solving multi-component monotonic MDPs with both budget and capacity constraints.
arXiv Detail & Related papers (2024-10-28T17:48:45Z) - Solving Truly Massive Budgeted Monotonic POMDPs with Oracle-Guided Meta-Reinforcement Learning [1.1470070927586018]
This paper considers the problem of solving budget-constrained multi-component monotonic POMDPs.
For a large number of components, solving such a POMDP using current methods is computationally intractable.
We introduce an oracle-guided meta-trained Proximal Policy Optimization (PPO) algorithm to solve each of the independent budget-constrained single-component monotonic POMDPs.
arXiv Detail & Related papers (2024-08-13T20:20:58Z) - Monte Carlo Planning for Stochastic Control on Constrained Markov Decision Processes [1.445706856497821]
This work defines an MDP framework, the textttSD-MDP, where we disentangle the causal structure of MDPs' transition and reward dynamics.
We derive theoretical guarantees on the estimation error of the value function under an optimal policy by allowing independent value estimation from Monte Carlo sampling.
arXiv Detail & Related papers (2024-06-23T16:22:40Z) - On the Global Convergence of Policy Gradient in Average Reward Markov
Decision Processes [50.68789924454235]
We present the first finite time global convergence analysis of policy gradient in the context of average reward Markov decision processes (MDPs)
Our analysis shows that the policy gradient iterates converge to the optimal policy at a sublinear rate of $Oleft(frac1Tright),$ which translates to $Oleft(log(T)right)$ regret, where $T$ represents the number of iterations.
arXiv Detail & Related papers (2024-03-11T15:25:03Z) - Scalable Online Exploration via Coverability [45.66375686120087]
Exploration is a major challenge in reinforcement learning, especially for high-dimensional domains that require function approximation.
We introduce a new objective, $L_Coverage, which generalizes previous exploration schemes and supports three fundamental desideratas.
$L_Coverage enables the first computationally efficient model-based and model-free algorithms for online (reward-free or reward-driven) reinforcement learning in MDPs with low coverability.
arXiv Detail & Related papers (2024-03-11T10:14:06Z) - Nearly Optimal Latent State Decoding in Block MDPs [74.51224067640717]
In episodic Block MDPs, the decision maker has access to rich observations or contexts generated from a small number of latent states.
We are first interested in estimating the latent state decoding function based on data generated under a fixed behavior policy.
We then study the problem of learning near-optimal policies in the reward-free framework.
arXiv Detail & Related papers (2022-08-17T18:49:53Z) - Efficient Policy Iteration for Robust Markov Decision Processes via
Regularization [49.05403412954533]
Robust decision processes (MDPs) provide a framework to model decision problems where the system dynamics are changing or only partially known.
Recent work established the equivalence between texttts rectangular $L_p$ robust MDPs and regularized MDPs, and derived a regularized policy iteration scheme that enjoys the same level of efficiency as standard MDPs.
In this work, we focus on the policy improvement step and derive concrete forms for the greedy policy and the optimal robust Bellman operators.
arXiv Detail & Related papers (2022-05-28T04:05:20Z) - Risk-Averse Decision Making Under Uncertainty [18.467950783426947]
A large class of decision making under uncertainty problems can be described via Markov decision processes (MDPs) or partially observable MDPs (POMDPs)
In this paper, we consider the problem of designing policies for MDPs and POMDPs with objectives and constraints in terms of dynamic coherent risk measures.
arXiv Detail & Related papers (2021-09-09T07:52:35Z) - Logistic Q-Learning [87.00813469969167]
We propose a new reinforcement learning algorithm derived from a regularized linear-programming formulation of optimal control in MDPs.
The main feature of our algorithm is a convex loss function for policy evaluation that serves as a theoretically sound alternative to the widely used squared Bellman error.
arXiv Detail & Related papers (2020-10-21T17:14:31Z) - Exploiting Submodular Value Functions For Scaling Up Active Perception [60.81276437097671]
In active perception tasks, agent aims to select sensory actions that reduce uncertainty about one or more hidden variables.
Partially observable Markov decision processes (POMDPs) provide a natural model for such problems.
As the number of sensors available to the agent grows, the computational cost of POMDP planning grows exponentially.
arXiv Detail & Related papers (2020-09-21T09:11:36Z) - Optimal Bayesian experimental design for subsurface flow problems [77.34726150561087]
We propose a novel approach for development of chaos expansion (PCE) surrogate model for the design utility function.
This novel technique enables the derivation of a reasonable quality response surface for the targeted objective function with a computational budget comparable to several single-point evaluations.
arXiv Detail & Related papers (2020-08-10T09:42:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.