Timing Process Interventions with Causal Inference and Reinforcement
Learning
- URL: http://arxiv.org/abs/2306.04299v1
- Date: Wed, 7 Jun 2023 10:02:16 GMT
- Title: Timing Process Interventions with Causal Inference and Reinforcement
Learning
- Authors: Hans Weytjens, Wouter Verbeke, Jochen De Weerdt
- Abstract summary: This paper presents experiments on timed process interventions with synthetic data that renders genuine online RL and the comparison to CI possible.
Our experiments reveal that RL's policies outperform those from CI and are more robust at the same time.
Unlike CI, the unaltered online RL approach can be applied to other, more generic PresPM problems such as next best activity recommendations.
- Score: 2.919859121836811
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The shift from the understanding and prediction of processes to their
optimization offers great benefits to businesses and other organizations.
Precisely timed process interventions are the cornerstones of effective
optimization. Prescriptive process monitoring (PresPM) is the sub-field of
process mining that concentrates on process optimization. The emerging PresPM
literature identifies state-of-the-art methods, causal inference (CI) and
reinforcement learning (RL), without presenting a quantitative comparison. Most
experiments are carried out using historical data, causing problems with the
accuracy of the methods' evaluations and preempting online RL. Our contribution
consists of experiments on timed process interventions with synthetic data that
renders genuine online RL and the comparison to CI possible, and allows for an
accurate evaluation of the results. Our experiments reveal that RL's policies
outperform those from CI and are more robust at the same time. Indeed, the RL
policies approach perfect policies. Unlike CI, the unaltered online RL approach
can be applied to other, more generic PresPM problems such as next best
activity recommendations. Nonetheless, CI has its merits in settings where
online learning is not an option.
Related papers
- Exploiting Estimation Bias in Clipped Double Q-Learning for Continous Control Reinforcement Learning Tasks [5.968716050740402]
This paper focuses on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks.
We design a Bias Exploiting (BE) mechanism to dynamically select the most advantageous estimation bias during training of the RL agent.
Most State-of-the-art Deep RL algorithms can be equipped with the BE mechanism, without hindering performance or computational complexity.
arXiv Detail & Related papers (2024-02-14T10:44:03Z) - Hyperparameters in Reinforcement Learning and How To Tune Them [25.782420501870295]
We show that hyper parameter choices in deep reinforcement learning can significantly affect the agent's final performance and sample efficiency.
We propose adopting established best practices from AutoML, such as the separation of tuning and testing seeds.
We support this by comparing state-of-the-art HPO tools on a range of RL algorithms and environments to their hand-tuned counterparts.
arXiv Detail & Related papers (2023-06-02T07:48:18Z) - Recommending the optimal policy by learning to act from temporal data [2.554326189662943]
This paper proposes an AI based approach that learns, by means of Reinforcement (RL)
The approach is validated on real and synthetic datasets and compared with off-policy Deep RL approaches.
The ability of our approach to compare with, and often overcome, Deep RL approaches provides a contribution towards the exploitation of white box RL techniques in scenarios where only temporal execution data are available.
arXiv Detail & Related papers (2023-03-16T10:30:36Z) - Deep Offline Reinforcement Learning for Real-world Treatment
Optimization Applications [3.770564448216192]
We introduce a practical and theoretically grounded transition sampling approach to address action imbalance during offline RL training.
We perform extensive experiments on two real-world tasks for diabetes and sepsis treatment optimization.
Across a range of principled and clinically relevant metrics, we show that our proposed approach enables substantial improvements in expected health outcomes.
arXiv Detail & Related papers (2023-02-15T09:30:57Z) - Distributional Reinforcement Learning for Scheduling of (Bio)chemical
Production Processes [0.0]
Reinforcement Learning (RL) has recently received significant attention from the process systems engineering and control communities.
We present a RL methodology to address precedence and disjunctive constraints as commonly imposed on production scheduling problems.
arXiv Detail & Related papers (2022-03-01T17:25:40Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Behavioral Priors and Dynamics Models: Improving Performance and Domain
Transfer in Offline RL [82.93243616342275]
We introduce Offline Model-based RL with Adaptive Behavioral Priors (MABE)
MABE is based on the finding that dynamics models, which support within-domain generalization, and behavioral priors, which support cross-domain generalization, are complementary.
In experiments that require cross-domain generalization, we find that MABE outperforms prior methods.
arXiv Detail & Related papers (2021-06-16T20:48:49Z) - DEALIO: Data-Efficient Adversarial Learning for Imitation from
Observation [57.358212277226315]
In imitation learning from observation IfO, a learning agent seeks to imitate a demonstrating agent using only observations of the demonstrated behavior without access to the control signals generated by the demonstrator.
Recent methods based on adversarial imitation learning have led to state-of-the-art performance on IfO problems, but they typically suffer from high sample complexity due to a reliance on data-inefficient, model-free reinforcement learning algorithms.
This issue makes them impractical to deploy in real-world settings, where gathering samples can incur high costs in terms of time, energy, and risk.
We propose a more data-efficient IfO algorithm
arXiv Detail & Related papers (2021-03-31T23:46:32Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z) - Provably Good Batch Reinforcement Learning Without Great Exploration [51.51462608429621]
Batch reinforcement learning (RL) is important to apply RL algorithms to many high stakes tasks.
Recent algorithms have shown promise but can still be overly optimistic in their expected outcomes.
We show that a small modification to Bellman optimality and evaluation back-up to take a more conservative update can have much stronger guarantees.
arXiv Detail & Related papers (2020-07-16T09:25:54Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.