Decision-Dependent Distributionally Robust Markov Decision Process
Method in Dynamic Epidemic Control
- URL: http://arxiv.org/abs/2306.14051v1
- Date: Sat, 24 Jun 2023 20:19:04 GMT
- Title: Decision-Dependent Distributionally Robust Markov Decision Process
Method in Dynamic Epidemic Control
- Authors: Jun Song, William Yang and Chaoyue Zhao
- Abstract summary: The Susceptible-Exposed-Infectious-Recovered (SEIR) model is widely used to represent the spread of infectious diseases.
We present a Distributionally Robust Markov Decision Process (DRMDP) approach for addressing the dynamic epidemic control problem.
- Score: 4.644416582073023
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we present a Distributionally Robust Markov Decision Process
(DRMDP) approach for addressing the dynamic epidemic control problem. The
Susceptible-Exposed-Infectious-Recovered (SEIR) model is widely used to
represent the stochastic spread of infectious diseases, such as COVID-19. While
Markov Decision Processes (MDP) offers a mathematical framework for identifying
optimal actions, such as vaccination and transmission-reducing intervention, to
combat disease spreading according to the SEIR model. However, uncertainties in
these scenarios demand a more robust approach that is less reliant on
error-prone assumptions. The primary objective of our study is to introduce a
new DRMDP framework that allows for an ambiguous distribution of transition
dynamics. Specifically, we consider the worst-case distribution of these
transition probabilities within a decision-dependent ambiguity set. To overcome
the computational complexities associated with policy determination, we propose
an efficient Real-Time Dynamic Programming (RTDP) algorithm that is capable of
computing optimal policies based on the reformulated DRMDP model in an
accurate, timely, and scalable manner. Comparative analysis against the classic
MDP model demonstrates that the DRMDP achieves a lower proportion of infections
and susceptibilities at a reduced cost.
Related papers
- Process Reward Model with Q-Value Rankings [18.907163177605607]
Process Reward Modeling (PRM) is critical for complex reasoning and decision-making tasks.
We introduce the Process Q-value Model (PQM), a novel framework that redefines PRM in the context of a Markov Decision Process.
PQM optimize Q-value rankings based on a novel comparative loss function, enhancing the model's ability to capture the intricate dynamics among sequential decisions.
arXiv Detail & Related papers (2024-10-15T05:10:34Z) - On the Foundation of Distributionally Robust Reinforcement Learning [19.621038847810198]
We contribute to the theoretical foundation of distributionally robust reinforcement learning (DRRL)
This framework obliges the decision maker to choose an optimal policy under the worst-case distributional shift orchestrated by an adversary.
Within this DRMDP framework, we investigate conditions for the existence or absence of the dynamic programming principle (DPP)
arXiv Detail & Related papers (2023-11-15T15:02:23Z) - Provably Efficient UCB-type Algorithms For Learning Predictive State
Representations [55.00359893021461]
The sequential decision-making problem is statistically learnable if it admits a low-rank structure modeled by predictive state representations (PSRs)
This paper proposes the first known UCB-type approach for PSRs, featuring a novel bonus term that upper bounds the total variation distance between the estimated and true models.
In contrast to existing approaches for PSRs, our UCB-type algorithms enjoy computational tractability, last-iterate guaranteed near-optimal policy, and guaranteed model accuracy.
arXiv Detail & Related papers (2023-07-01T18:35:21Z) - Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity [7.57543767554282]
This paper introduces a new formulation for risk-sensitive MDPs, which assesses risk in a slightly different manner compared to the classical Markov risk measure.
We derive the policy gradient theorem for both problems, proving gradient domination and global convergence of the exact policy gradient method.
We also propose a sample-based offline learning algorithm, namely the robust fitted-Z iteration (RFZI)
arXiv Detail & Related papers (2023-06-20T15:51:25Z) - The Curious Price of Distributional Robustness in Reinforcement Learning with a Generative Model [61.87673435273466]
This paper investigates model robustness in reinforcement learning (RL) to reduce the sim-to-real gap in practice.
We adopt the framework of distributionally robust Markov decision processes (RMDPs), aimed at learning a policy that optimize the worst-case performance when the deployed environment falls within a prescribed uncertainty set around the nominal MDP.
arXiv Detail & Related papers (2023-05-26T02:32:03Z) - Risk-Averse MDPs under Reward Ambiguity [9.929659318167731]
We propose a distributionally robust return-risk model for Markov decision processes (MDPs) under risk and reward ambiguity.
A scalable first-order algorithm is designed to solve large-scale problems.
arXiv Detail & Related papers (2023-01-03T11:06:30Z) - Reinforcement Learning with a Terminator [80.34572413850186]
We learn the parameters of the TerMDP and leverage the structure of the estimation problem to provide state-wise confidence bounds.
We use these to construct a provably-efficient algorithm, which accounts for termination, and bound its regret.
arXiv Detail & Related papers (2022-05-30T18:40:28Z) - Robust Entropy-regularized Markov Decision Processes [23.719568076996662]
We study a robust version of the ER-MDP model, where the optimal policies are required to be robust.
We show that essential properties that hold for the non-robust ER-MDP and robust unregularized MDP models also hold in our settings.
We show how our framework and results can be integrated into different algorithmic schemes including value or (modified) policy.
arXiv Detail & Related papers (2021-12-31T09:50:46Z) - Modeling the Second Player in Distributionally Robust Optimization [90.25995710696425]
We argue for the use of neural generative models to characterize the worst-case distribution.
This approach poses a number of implementation and optimization challenges.
We find that the proposed approach yields models that are more robust than comparable baselines.
arXiv Detail & Related papers (2021-03-18T14:26:26Z) - Stein Variational Model Predictive Control [130.60527864489168]
Decision making under uncertainty is critical to real-world, autonomous systems.
Model Predictive Control (MPC) methods have demonstrated favorable performance in practice, but remain limited when dealing with complex distributions.
We show that this framework leads to successful planning in challenging, non optimal control problems.
arXiv Detail & Related papers (2020-11-15T22:36:59Z) - An Optimal Control Approach to Learning in SIDARTHE Epidemic model [67.22168759751541]
We propose a general approach for learning time-variant parameters of dynamic compartmental models from epidemic data.
We forecast the epidemic evolution in Italy and France.
arXiv Detail & Related papers (2020-10-28T10:58:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.