Recommending the optimal policy by learning to act from temporal data
- URL: http://arxiv.org/abs/2303.09209v1
- Date: Thu, 16 Mar 2023 10:30:36 GMT
- Title: Recommending the optimal policy by learning to act from temporal data
- Authors: Stefano Branchi, Andrei Buliga, Chiara Di Francescomarino, Chiara
Ghidini, Francesca Meneghello, Massimiliano Ronzani
- Abstract summary: This paper proposes an AI based approach that learns, by means of Reinforcement (RL)
The approach is validated on real and synthetic datasets and compared with off-policy Deep RL approaches.
The ability of our approach to compare with, and often overcome, Deep RL approaches provides a contribution towards the exploitation of white box RL techniques in scenarios where only temporal execution data are available.
- Score: 2.554326189662943
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Prescriptive Process Monitoring is a prominent problem in Process Mining,
which consists in identifying a set of actions to be recommended with the goal
of optimising a target measure of interest or Key Performance Indicator (KPI).
One challenge that makes this problem difficult is the need to provide
Prescriptive Process Monitoring techniques only based on temporally annotated
(process) execution data, stored in, so-called execution logs, due to the lack
of well crafted and human validated explicit models. In this paper we aim at
proposing an AI based approach that learns, by means of Reinforcement Learning
(RL), an optimal policy (almost) only from the observation of past executions
and recommends the best activities to carry on for optimizing a KPI of
interest. This is achieved first by learning a Markov Decision Process for the
specific KPIs from data, and then by using RL training to learn the optimal
policy. The approach is validated on real and synthetic datasets and compared
with off-policy Deep RL approaches. The ability of our approach to compare
with, and often overcome, Deep RL approaches provides a contribution towards
the exploitation of white box RL techniques in scenarios where only temporal
execution data are available.
Related papers
- Efficient Learning of POMDPs with Known Observation Model in Average-Reward Setting [56.92178753201331]
We propose the Observation-Aware Spectral (OAS) estimation technique, which enables the POMDP parameters to be learned from samples collected using a belief-based policy.
We show the consistency of the OAS procedure, and we prove a regret guarantee of order $mathcalO(sqrtT log(T)$ for the proposed OAS-UCRL algorithm.
arXiv Detail & Related papers (2024-10-02T08:46:34Z) - OffRIPP: Offline RL-based Informative Path Planning [12.705099730591671]
IPP is a crucial task in robotics, where agents must design paths to gather valuable information about a target environment.
We propose an offline RL-based IPP framework that optimized information gain without requiring real-time interaction during training.
We validate the framework through extensive simulations and real-world experiments.
arXiv Detail & Related papers (2024-09-25T11:30:59Z) - Towards Efficient Exact Optimization of Language Model Alignment [93.39181634597877]
Direct preference optimization (DPO) was proposed to directly optimize the policy from preference data.
We show that DPO derived based on the optimal solution of problem leads to a compromised mean-seeking approximation of the optimal solution in practice.
We propose efficient exact optimization (EXO) of the alignment objective.
arXiv Detail & Related papers (2024-02-01T18:51:54Z) - Iteratively Refined Behavior Regularization for Offline Reinforcement
Learning [57.10922880400715]
In this paper, we propose a new algorithm that substantially enhances behavior-regularization based on conservative policy iteration.
By iteratively refining the reference policy used for behavior regularization, conservative policy update guarantees gradually improvement.
Experimental results on the D4RL benchmark indicate that our method outperforms previous state-of-the-art baselines in most tasks.
arXiv Detail & Related papers (2023-06-09T07:46:24Z) - Timing Process Interventions with Causal Inference and Reinforcement
Learning [2.919859121836811]
This paper presents experiments on timed process interventions with synthetic data that renders genuine online RL and the comparison to CI possible.
Our experiments reveal that RL's policies outperform those from CI and are more robust at the same time.
Unlike CI, the unaltered online RL approach can be applied to other, more generic PresPM problems such as next best activity recommendations.
arXiv Detail & Related papers (2023-06-07T10:02:16Z) - A Strong Baseline for Batch Imitation Learning [25.392006064406967]
We provide an easy-to-implement, novel algorithm for imitation learning under a strict data paradigm.
This paradigm allows our algorithm to be used for environments in which safety or cost are of critical concern.
arXiv Detail & Related papers (2023-02-06T14:03:33Z) - Jump-Start Reinforcement Learning [68.82380421479675]
We present a meta algorithm that can use offline data, demonstrations, or a pre-existing policy to initialize an RL policy.
In particular, we propose Jump-Start Reinforcement Learning (JSRL), an algorithm that employs two policies to solve tasks.
We show via experiments that JSRL is able to significantly outperform existing imitation and reinforcement learning algorithms.
arXiv Detail & Related papers (2022-04-05T17:25:22Z) - Learning to act: a Reinforcement Learning approach to recommend the best
next activities [4.511664266033014]
This paper investigates an approach that learns, by means of Reinforcement Learning, an optimal policy from the observation of past executions.
The potentiality of the approach has been demonstrated on two scenarios taken from real-life data.
arXiv Detail & Related papers (2022-03-29T09:43:39Z) - OptiDICE: Offline Policy Optimization via Stationary Distribution
Correction Estimation [59.469401906712555]
We present an offline reinforcement learning algorithm that prevents overestimation in a more principled way.
Our algorithm, OptiDICE, directly estimates the stationary distribution corrections of the optimal policy.
We show that OptiDICE performs competitively with the state-of-the-art methods.
arXiv Detail & Related papers (2021-06-21T00:43:30Z) - Policy Information Capacity: Information-Theoretic Measure for Task
Complexity in Deep Reinforcement Learning [83.66080019570461]
We propose two environment-agnostic, algorithm-agnostic quantitative metrics for task difficulty.
We show that these metrics have higher correlations with normalized task solvability scores than a variety of alternatives.
These metrics can also be used for fast and compute-efficient optimizations of key design parameters.
arXiv Detail & Related papers (2021-03-23T17:49:50Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.