Learning to act: a Reinforcement Learning approach to recommend the best
next activities
- URL: http://arxiv.org/abs/2203.15398v1
- Date: Tue, 29 Mar 2022 09:43:39 GMT
- Title: Learning to act: a Reinforcement Learning approach to recommend the best
next activities
- Authors: Stefano Branchi, Chiara Di Francescomarino, Chiara Ghidini, David
Massimo, Francesco Ricci and Massimiliano Ronzani
- Abstract summary: This paper investigates an approach that learns, by means of Reinforcement Learning, an optimal policy from the observation of past executions.
The potentiality of the approach has been demonstrated on two scenarios taken from real-life data.
- Score: 4.511664266033014
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The rise of process data availability has led in the last decade to the
development of several data-driven learning approaches. However, most of these
approaches limit themselves to use the learned model to predict the future of
ongoing process executions. The goal of this paper is moving a step forward and
leveraging data with the purpose of learning to act by supporting users with
recommendations for the best strategy to follow, in order to optimize a measure
of performance. In this paper, we take the (optimization) perspective of one
process actor and we recommend the best activities to execute next, in response
to what happens in a complex external environment, where there is no control on
exogenous factors. To this aim, we investigate an approach that learns, by
means of Reinforcement Learning, an optimal policy from the observation of past
executions and recommends the best activities to carry on for optimizing a Key
Performance Indicator of interest. The potentiality of the approach has been
demonstrated on two scenarios taken from real-life data.
Related papers
- Optimal Execution with Reinforcement Learning [0.4972323953932129]
This study investigates the development of an optimal execution strategy through reinforcement learning.
We present a custom MDP formulation followed by the results of our methodology and benchmark the performance against standard execution strategies.
arXiv Detail & Related papers (2024-11-10T08:21:03Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.
We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Denoising Pre-Training and Customized Prompt Learning for Efficient Multi-Behavior Sequential Recommendation [69.60321475454843]
We propose DPCPL, the first pre-training and prompt-tuning paradigm tailored for Multi-Behavior Sequential Recommendation.
In the pre-training stage, we propose a novel Efficient Behavior Miner (EBM) to filter out the noise at multiple time scales.
Subsequently, we propose to tune the pre-trained model in a highly efficient manner with the proposed Customized Prompt Learning (CPL) module.
arXiv Detail & Related papers (2024-08-21T06:48:38Z) - Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement [50.481380478458945]
Iterative step-level Process Refinement (IPR) framework provides detailed step-by-step guidance to enhance agent training.
Our experiments on three complex agent tasks demonstrate that our framework outperforms a variety of strong baselines.
arXiv Detail & Related papers (2024-06-17T03:29:13Z) - Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents [49.85633804913796]
We present an exploration-based trajectory optimization approach, referred to as ETO.
This learning method is designed to enhance the performance of open LLM agents.
Our experiments on three complex tasks demonstrate that ETO consistently surpasses baseline performance by a large margin.
arXiv Detail & Related papers (2024-03-04T21:50:29Z) - DRDT: Dynamic Reflection with Divergent Thinking for LLM-based
Sequential Recommendation [53.62727171363384]
We introduce a novel reasoning principle: Dynamic Reflection with Divergent Thinking.
Our methodology is dynamic reflection, a process that emulates human learning through probing, critiquing, and reflecting.
We evaluate our approach on three datasets using six pre-trained LLMs.
arXiv Detail & Related papers (2023-12-18T16:41:22Z) - Recommending the optimal policy by learning to act from temporal data [2.554326189662943]
This paper proposes an AI based approach that learns, by means of Reinforcement (RL)
The approach is validated on real and synthetic datasets and compared with off-policy Deep RL approaches.
The ability of our approach to compare with, and often overcome, Deep RL approaches provides a contribution towards the exploitation of white box RL techniques in scenarios where only temporal execution data are available.
arXiv Detail & Related papers (2023-03-16T10:30:36Z) - Efficient Real-world Testing of Causal Decision Making via Bayesian
Experimental Design for Contextual Optimisation [12.37745209793872]
We introduce a model-agnostic framework for gathering data to evaluate and improve contextual decision making.
Our method is used for the data-efficient evaluation of the regret of past treatment assignments.
arXiv Detail & Related papers (2022-07-12T01:20:11Z) - Goal-Oriented Next Best Activity Recommendation using Reinforcement
Learning [4.128679340077271]
We propose a goal-oriented next best activity recommendation framework.
A deep learning model predicts the next best activity and an estimated value of a goal given the activity.
A reinforcement learning method explores the sequence of activities based on the estimates likely to meet one or more goals.
arXiv Detail & Related papers (2022-05-06T13:48:14Z) - SURF: Semi-supervised Reward Learning with Data Augmentation for
Feedback-efficient Preference-based Reinforcement Learning [168.89470249446023]
We present SURF, a semi-supervised reward learning framework that utilizes a large amount of unlabeled samples with data augmentation.
In order to leverage unlabeled samples for reward learning, we infer pseudo-labels of the unlabeled samples based on the confidence of the preference predictor.
Our experiments demonstrate that our approach significantly improves the feedback-efficiency of the preference-based method on a variety of locomotion and robotic manipulation tasks.
arXiv Detail & Related papers (2022-03-18T16:50:38Z) - Recommendation Fairness: From Static to Dynamic [12.080824433982993]
We discuss how fairness could be baked into reinforcement learning techniques for recommendation.
We argue that in order to make further progress in recommendation fairness, we may want to consider multi-agent (game-theoretic) optimization, multi-objective (Pareto) optimization.
arXiv Detail & Related papers (2021-09-05T21:38:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.