An Innovative Data-Driven and Adaptive Reinforcement Learning Approach for Context-Aware Prescriptive Process Monitoring
- URL: http://arxiv.org/abs/2501.10543v2
- Date: Tue, 27 May 2025 23:13:18 GMT
- Title: An Innovative Data-Driven and Adaptive Reinforcement Learning Approach for Context-Aware Prescriptive Process Monitoring
- Authors: Mostafa Abbasi, Maziyar Khadivi, Maryam Ahang, Patricia Lasserre, Yves Lucet, Homayoun Najjaran,
- Abstract summary: We present a novel framework called Fine-Tuned Offline Reinforcement Learning Augmented Process Sequence Optimization.<n>FORLAPS aims to identify optimal execution paths in business processes by leveraging reinforcement learning enhanced with a state-dependent reward shaping mechanism.<n>We show that FORLAPS achieves 31% savings in resource time spent and a 23% reduction in process time span.
- Score: 3.4437362489150254
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: The application of artificial intelligence and machine learning in business process management has advanced significantly, however, the full potential of these technologies remains largely unexplored, primarily due to challenges related to data quality and availability. We present a novel framework called Fine-Tuned Offline Reinforcement Learning Augmented Process Sequence Optimization (FORLAPS), which aims to identify optimal execution paths in business processes by leveraging reinforcement learning enhanced with a state-dependent reward shaping mechanism, thereby enabling context-sensitive prescriptions. Additionally, to compare FORLAPS with the existing models (Permutation Feature Importance and multi-task Long Short Term Memory model), we experimented to evaluate its effectiveness in terms of resource savings and process time reduction. The experimental results on real-life event logs validate that FORLAPS achieves 31% savings in resource time spent and a 23% reduction in process time span. To further enhance learning, we introduce an innovative process-aware data augmentation technique that selectively increases the average estimated Q-values in sampled batches, enabling automatic fine-tuning of the reinforcement learning model. Robustness was assessed through both prefix-level and trace-level evaluations, using the Damerau-Levenshtein distance as the primary metric. Finally, the model's adaptability across industries was further validated through diverse case studies, including healthcare treatment pathways, financial services workflows, permit applications from regulatory bodies, and operations management. In each domain, the proposed model demonstrated exceptional performance, outperforming existing state-of-the-art approaches in prescriptive decision-making, demonstrating its capability to prescribe optimal next steps and predict the best next activities within a process trace.
Related papers
- LLM-guided Chemical Process Optimization with a Multi-Agent Approach [5.417632175667162]
Chemical process optimization is crucial to maximize production efficiency and economic performance.<n>Traditional methods, including gradient-based, evolutionary algorithms, and parameter grid searches, become impractical when operating constraints are ill-defined or unavailable.<n>We present a multi-agent framework of large language model (LLM) agents that autonomously infer operating constraints from minimal process descriptions.
arXiv Detail & Related papers (2025-06-26T01:03:44Z) - Optimising Language Models for Downstream Tasks: A Post-Training Perspective [0.0]
Language models (LMs) have demonstrated remarkable capabilities in NLP.<n>But adapting them efficiently and robustly to specific tasks remains challenging.<n>This thesis proposes a series of methods to better adapt LMs to downstream applications.
arXiv Detail & Related papers (2025-06-26T00:49:35Z) - ToolACE-R: Tool Learning with Adaptive Self-Refinement [84.69651852838794]
Tool learning allows Large Language Models to leverage external tools for solving complex user tasks.
We propose ToolACE-R, a novel method that introduces adaptive self-refinement for tool invocations.
Our results demonstrate the effectiveness of the proposed method, which is compatible with base models of various sizes.
arXiv Detail & Related papers (2025-04-02T06:38:56Z) - DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [55.13854171147104]
Large Language Models (LLMs) have revolutionized various domains, including natural language processing, data analysis, and software development.<n>We present Dynamic Action Re-Sampling (DARS), a novel inference time compute scaling approach for coding agents.<n>We evaluate our approach on SWE-Bench Lite benchmark, demonstrating that this scaling strategy achieves a pass@k score of 55% with Claude 3.5 Sonnet V2.
arXiv Detail & Related papers (2025-03-18T14:02:59Z) - Improving Retrospective Language Agents via Joint Policy Gradient Optimization [57.35348425288859]
RetroAct is a framework that jointly optimize both task-planning and self-reflective evolution capabilities in language agents.<n>We develop a two-stage joint optimization process that integrates imitation learning and reinforcement learning.<n>We conduct extensive experiments across various testing environments, demonstrating RetroAct has substantial improvements in task performance and decision-making processes.
arXiv Detail & Related papers (2025-03-03T12:54:54Z) - Dynamic Noise Preference Optimization for LLM Self-Improvement via Synthetic Data [51.62162460809116]
We introduce Dynamic Noise Preference Optimization (DNPO) to ensure consistent improvements across iterations.
In experiments with Zephyr-7B, DNPO consistently outperforms existing methods, showing an average performance boost of 2.6%.
DNPO shows a significant improvement in model-generated data quality, with a 29.4% win-loss rate gap compared to the baseline in GPT-4 evaluations.
arXiv Detail & Related papers (2025-02-08T01:20:09Z) - Learning Dynamic Representations via An Optimally-Weighted Maximum Mean Discrepancy Optimization Framework for Continual Learning [16.10753846850319]
Continual learning allows models to persistently acquire and retain information.<n> catastrophic forgetting can severely impair model performance.<n>We introduce a novel framework termed Optimally-Weighted Mean Discrepancy (OWMMD), which imposes penalties on representation alterations.
arXiv Detail & Related papers (2025-01-21T13:33:45Z) - The Surprising Effectiveness of Test-Time Training for Abstract Reasoning [64.36534512742736]
We investigate the effectiveness of test-time training (TTT) as a mechanism for improving models' reasoning capabilities.
TTT significantly improves performance on ARC tasks, achieving up to 6x improvement in accuracy compared to base fine-tuned models.
Our findings suggest that explicit symbolic search is not the only path to improved abstract reasoning in neural language models.
arXiv Detail & Related papers (2024-11-11T18:59:45Z) - From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning [62.54484062185869]
We introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process.<n>We propose implicit-reward and inverse reinforcement learning techniques to facilitate agent reflection and policy adjustment.
arXiv Detail & Related papers (2024-11-06T10:35:11Z) - Memory-Enhanced Neural Solvers for Efficient Adaptation in Combinatorial Optimization [6.713974813995327]
We present MEMENTO, an approach that leverages memory to improve the adaptation of neural solvers at time.
We successfully train all RL auto-regressive solvers on large instances, and show that MEMENTO can scale and is data-efficient.
Overall, MEMENTO enables to push the state-of-the-art on 11 out of 12 evaluated tasks.
arXiv Detail & Related papers (2024-06-24T08:18:19Z) - Efficient Continual Pre-training by Mitigating the Stability Gap [68.49269649759005]
We study the behavior of Large Language Models (LLMs) during continual pre-training.
We propose three effective strategies to enhance LLM performance within a fixed compute budget.
Our strategies improve the average medical task performance of the OpenLlama-3B model from 36.2% to 40.7% with only 40% of the original training budget.
arXiv Detail & Related papers (2024-06-21T02:28:37Z) - Parameter-Efficient Active Learning for Foundational models [7.799711162530711]
Foundational vision transformer models have shown impressive few shot performance on many vision tasks.
This research presents a novel investigation into the application of parameter efficient fine-tuning methods within an active learning (AL) framework.
arXiv Detail & Related papers (2024-06-13T16:30:32Z) - Refining 3D Point Cloud Normal Estimation via Sample Selection [13.207964615561261]
We introduce a fundamental framework for normal estimation, enhancing existing model through the incorporation of global information and various constraint mechanisms.
We also utilize existing orientation methods to correct estimated non-oriented normals, achieving state-of-the-art performance in both oriented and non-oriented tasks.
arXiv Detail & Related papers (2024-05-20T02:06:10Z) - Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning [55.96599486604344]
We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process.
We use Monte Carlo Tree Search (MCTS) to iteratively collect preference data, utilizing its look-ahead ability to break down instance-level rewards into more granular step-level signals.
The proposed algorithm employs Direct Preference Optimization (DPO) to update the LLM policy using this newly generated step-level preference data.
arXiv Detail & Related papers (2024-05-01T11:10:24Z) - Learning Off-policy with Model-based Intrinsic Motivation For Active Online Exploration [15.463313629574111]
This paper investigates how to achieve sample-efficient exploration in continuous control tasks.
We introduce an RL algorithm that incorporates a predictive model and off-policy learning elements.
We derive an intrinsic reward without incurring parameters overhead.
arXiv Detail & Related papers (2024-03-31T11:39:11Z) - Augmenting Unsupervised Reinforcement Learning with Self-Reference [63.68018737038331]
Humans possess the ability to draw on past experiences explicitly when learning new tasks.
We propose the Self-Reference (SR) approach, an add-on module explicitly designed to leverage historical information.
Our approach achieves state-of-the-art results in terms of Interquartile Mean (IQM) performance and Optimality Gap reduction on the Unsupervised Reinforcement Learning Benchmark.
arXiv Detail & Related papers (2023-11-16T09:07:34Z) - L3 Ensembles: Lifelong Learning Approach for Ensemble of Foundational
Language Models [15.726224465017596]
We propose an approach that focuses on extracting meaningful representations from unseen data and constructing a structured knowledge base.
We conducted experiments on various NLP tasks to validate its effectiveness, including benchmarks like GLUE and SuperGLUE.
The proposed L3 ensemble method increases the model accuracy by 4% 36% compared to the fine-tuned FLM.
arXiv Detail & Related papers (2023-11-11T06:59:50Z) - Learning Objective-Specific Active Learning Strategies with Attentive
Neural Processes [72.75421975804132]
Learning Active Learning (LAL) suggests to learn the active learning strategy itself, allowing it to adapt to the given setting.
We propose a novel LAL method for classification that exploits symmetry and independence properties of the active learning problem.
Our approach is based on learning from a myopic oracle, which gives our model the ability to adapt to non-standard objectives.
arXiv Detail & Related papers (2023-09-11T14:16:37Z) - Posterior Sampling for Deep Reinforcement Learning [0.0]
This paper introduces Posterior Sampling for Deep Reinforcement Learning (PSDRL), the first truly scalable approximation of Posterior Sampling for Reinforcement Learning.
Experiments on the Atari benchmark show that PSDRL significantly outperforms previous state-of-the-art attempts at scaling up posterior sampling.
arXiv Detail & Related papers (2023-04-30T13:23:50Z) - Data Augmentation through Expert-guided Symmetry Detection to Improve
Performance in Offline Reinforcement Learning [0.0]
offline estimation of the dynamical model of a Markov Decision Process (MDP) is a non-trivial task.
Recent works showed that an expert-guided pipeline relying on Density Estimation methods effectively detects this structure in deterministic environments.
We show that the former results lead to a performance improvement when solving the learned MDP and then applying the optimized policy in the real environment.
arXiv Detail & Related papers (2021-12-18T14:32:32Z) - On Effective Scheduling of Model-based Reinforcement Learning [53.027698625496015]
We propose a framework named AutoMBPO to automatically schedule the real data ratio.
In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance.
arXiv Detail & Related papers (2021-11-16T15:24:59Z) - Few-shot Quality-Diversity Optimization [50.337225556491774]
Quality-Diversity (QD) optimization has been shown to be effective tools in dealing with deceptive minima and sparse rewards in Reinforcement Learning.
We show that, given examples from a task distribution, information about the paths taken by optimization in parameter space can be leveraged to build a prior population, which when used to initialize QD methods in unseen environments, allows for few-shot adaptation.
Experiments carried in both sparse and dense reward settings using robotic manipulation and navigation benchmarks show that it considerably reduces the number of generations that are required for QD optimization in these environments.
arXiv Detail & Related papers (2021-09-14T17:12:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.