Deep reinforcement learning driven inspection and maintenance planning
under incomplete information and constraints
- URL: http://arxiv.org/abs/2007.01380v1
- Date: Thu, 2 Jul 2020 20:44:07 GMT
- Title: Deep reinforcement learning driven inspection and maintenance planning
under incomplete information and constraints
- Authors: C.P. Andriotis, K.G. Papakonstantinou
- Abstract summary: Determination of inspection and maintenance policies constitutes a complex optimization problem.
In this work, these challenges are addressed within a joint framework of constrained Partially Observable Decision Processes (POMDP) and multi-agent Deep Reinforcement Learning (DRL)
The proposed framework is found to outperform well-established policy baselines and facilitate adept prescription of inspection and intervention actions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Determination of inspection and maintenance policies for minimizing long-term
risks and costs in deteriorating engineering environments constitutes a complex
optimization problem. Major computational challenges include the (i) curse of
dimensionality, due to exponential scaling of state/action set cardinalities
with the number of components; (ii) curse of history, related to exponentially
growing decision-trees with the number of decision-steps; (iii) presence of
state uncertainties, induced by inherent environment stochasticity and
variability of inspection/monitoring measurements; (iv) presence of
constraints, pertaining to stochastic long-term limitations, due to resource
scarcity and other infeasible/undesirable system responses. In this work, these
challenges are addressed within a joint framework of constrained Partially
Observable Markov Decision Processes (POMDP) and multi-agent Deep Reinforcement
Learning (DRL). POMDPs optimally tackle (ii)-(iii), combining stochastic
dynamic programming with Bayesian inference principles. Multi-agent DRL
addresses (i), through deep function parametrizations and decentralized control
assumptions. Challenge (iv) is herein handled through proper state augmentation
and Lagrangian relaxation, with emphasis on life-cycle risk-based constraints
and budget limitations. The underlying algorithmic steps are provided, and the
proposed framework is found to outperform well-established policy baselines and
facilitate adept prescription of inspection and intervention actions, in cases
where decisions must be made in the most resource- and risk-aware manner.
Related papers
- Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning [62.81324245896717]
We introduce an exploration-agnostic algorithm, called C-PG, which exhibits global last-ite convergence guarantees under (weak) gradient domination assumptions.
We numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines.
arXiv Detail & Related papers (2024-07-15T14:54:57Z) - Multi-agent deep reinforcement learning with centralized training and
decentralized execution for transportation infrastructure management [0.0]
We present a multi-agent Deep Reinforcement Learning (DRL) framework for managing large transportation infrastructure systems over their life-cycle.
Life-cycle management of such engineering systems is a computationally intensive task, requiring appropriate sequential inspection and maintenance decisions.
arXiv Detail & Related papers (2024-01-23T02:52:36Z) - Model-Based Epistemic Variance of Values for Risk-Aware Policy Optimization [59.758009422067]
We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning.
We propose a new uncertainty Bellman equation (UBE) whose solution converges to the true posterior variance over values.
We introduce a general-purpose policy optimization algorithm, Q-Uncertainty Soft Actor-Critic (QU-SAC) that can be applied for either risk-seeking or risk-averse policy optimization.
arXiv Detail & Related papers (2023-12-07T15:55:58Z) - The Statistical Complexity of Interactive Decision Making [126.04974881555094]
We provide a complexity measure, the Decision-Estimation Coefficient, that is proven to be both necessary and sufficient for sample-efficient interactive learning.
A unified algorithm design principle, Estimation-to-Decisions (E2D), transforms any algorithm for supervised estimation into an online algorithm for decision making.
arXiv Detail & Related papers (2021-12-27T02:53:44Z) - Identification of Unexpected Decisions in Partially Observable
Monte-Carlo Planning: a Rule-Based Approach [78.05638156687343]
We propose a methodology for analyzing POMCP policies by inspecting their traces.
The proposed method explores local properties of policy behavior to identify unexpected decisions.
We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation.
arXiv Detail & Related papers (2020-12-23T15:09:28Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - Coordinated Online Learning for Multi-Agent Systems with Coupled
Constraints and Perturbed Utility Observations [91.02019381927236]
We introduce a novel method to steer the agents toward a stable population state, fulfilling the given resource constraints.
The proposed method is a decentralized resource pricing method based on the resource loads resulting from the augmentation of the game's Lagrangian.
arXiv Detail & Related papers (2020-10-21T10:11:17Z) - Optimal Inspection and Maintenance Planning for Deteriorating Structural
Components through Dynamic Bayesian Networks and Markov Decision Processes [0.0]
Partially Observable Markov Decision Processes (POMDPs) provide a mathematical methodology for optimal control under uncertain action outcomes and observations.
We provide the formulation for developing both infinite and finite horizon POMDPs in a structural reliability context.
Results show that POMDPs achieve substantially lower costs as compared to their counterparts, even for traditional problem settings.
arXiv Detail & Related papers (2020-09-09T20:03:42Z) - Parameterized MDPs and Reinforcement Learning Problems -- A Maximum
Entropy Principle Based Framework [2.741266294612776]
We present a framework to address a class of sequential decision making problems.
Our framework features learning the optimal control policy with robustness to noisy data.
arXiv Detail & Related papers (2020-06-17T04:08:35Z) - Cautious Reinforcement Learning via Distributional Risk in the Dual
Domain [45.17200683056563]
We study the estimation of risk-sensitive policies in reinforcement learning problems defined by a Markov Decision Process (MDPs) whose state and action spaces are countably finite.
We propose a new definition of risk, which we call caution, as a penalty function added to the dual objective of the linear programming (LP) formulation of reinforcement learning.
arXiv Detail & Related papers (2020-02-27T23:18:04Z) - Value of structural health information in partially observable
stochastic environments [0.0]
We introduce and study the theoretical and computational foundations of the Value of Information (VoI) and the Value of Structural Health Monitoring (VoSHM)
It is shown that a POMDP policy inherently leverages the notion of VoI to guide observational actions in an optimal way at every decision step.
arXiv Detail & Related papers (2019-12-28T22:18:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.