Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks
- URL: http://arxiv.org/abs/2407.02310v1
- Date: Tue, 2 Jul 2024 14:44:49 GMT
- Title: Evaluating the Ability of LLMs to Solve Semantics-Aware Process Mining Tasks
- Authors: Adrian Rebmann, Fabian David Schmidt, Goran Glavaš, Han van der Aa,
- Abstract summary: Large language models (LLMs) could be used to tackle process mining tasks that benefit from an understanding of process behavior.
In this paper, we investigate the capabilities of LLMs to tackle such semantics-aware process mining tasks.
- Score: 3.9273545629281252
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The process mining community has recently recognized the potential of large language models (LLMs) for tackling various process mining tasks. Initial studies report the capability of LLMs to support process analysis and even, to some extent, that they are able to reason about how processes work. This latter property suggests that LLMs could also be used to tackle process mining tasks that benefit from an understanding of process behavior. Examples of such tasks include (semantic) anomaly detection and next activity prediction, which both involve considerations of the meaning of activities and their inter-relations. In this paper, we investigate the capabilities of LLMs to tackle such semantics-aware process mining tasks. Furthermore, whereas most works on the intersection of LLMs and process mining only focus on testing these models out of the box, we provide a more principled investigation of the utility of LLMs for process mining, including their ability to obtain process mining knowledge post-hoc by means of in-context learning and supervised fine-tuning. Concretely, we define three process mining tasks that benefit from an understanding of process semantics and provide extensive benchmarking datasets for each of them. Our evaluation experiments reveal that (1) LLMs fail to solve challenging process mining tasks out of the box and when provided only a handful of in-context examples, (2) but they yield strong performance when fine-tuned for these tasks, consistently surpassing smaller, encoder-based language models.
Related papers
- EVOLvE: Evaluating and Optimizing LLMs For Exploration [76.66831821738927]
Large language models (LLMs) remain under-studied in scenarios requiring optimal decision-making under uncertainty.
We measure LLMs' (in)ability to make optimal decisions in bandits, a state-less reinforcement learning setting relevant to many applications.
Motivated by the existence of optimal exploration algorithms, we propose efficient ways to integrate this algorithmic knowledge into LLMs.
arXiv Detail & Related papers (2024-10-08T17:54:03Z) - Interpreting and Improving Large Language Models in Arithmetic Calculation [72.19753146621429]
Large language models (LLMs) have demonstrated remarkable potential across numerous applications.
In this work, we delve into uncovering a specific mechanism by which LLMs execute calculations.
We investigate the potential benefits of selectively fine-tuning these essential heads/MLPs to boost the LLMs' computational performance.
arXiv Detail & Related papers (2024-09-03T07:01:46Z) - Re-Thinking Process Mining in the AI-Based Agents Era [39.58317527488534]
Large Language Models (LLMs) have emerged as powerful conversational interfaces, and their application in process mining (PM) tasks has shown promising results.
This paper proposes utilizing the AI-Based Agents (AgWf) paradigm to enhance the effectiveness of PM on LLMs.
We examine various implementations of AgWf and the types of AI-based tasks involved.
arXiv Detail & Related papers (2024-08-14T10:14:18Z) - PM-LLM-Benchmark: Evaluating Large Language Models on Process Mining Tasks [45.129578769739]
Large Language Models (LLMs) have the potential to semi-automate some process mining (PM) analyses.
We propose PM-LLM-Benchmark, the first comprehensive benchmark for PM focusing on domain knowledge.
We observe that most of the considered LLMs can perform some process mining tasks at a satisfactory level, but tiny models that would run on edge devices are still inadequate.
arXiv Detail & Related papers (2024-07-18T07:57:31Z) - Towards a Benchmark for Causal Business Process Reasoning with LLMs [2.273531916003657]
Large Language Models (LLMs) are increasingly used for boosting organizational efficiency and automating tasks.
Recent efforts have further extended to employ LLMs in activities such as reasoning, planning, and decision-making.
In this work, we plant the seeds for the development of a benchmark to assess the ability of LLMs to reason about causal and process perspectives of business operations.
arXiv Detail & Related papers (2024-06-08T16:10:53Z) - C-ICL: Contrastive In-context Learning for Information Extraction [54.39470114243744]
c-ICL is a novel few-shot technique that leverages both correct and incorrect sample constructions to create in-context learning demonstrations.
Our experiments on various datasets indicate that c-ICL outperforms previous few-shot in-context learning methods.
arXiv Detail & Related papers (2024-02-17T11:28:08Z) - TaskBench: Benchmarking Large Language Models for Task Automation [82.2932794189585]
We introduce TaskBench, a framework to evaluate the capability of large language models (LLMs) in task automation.
Specifically, task decomposition, tool selection, and parameter prediction are assessed.
Our approach combines automated construction with rigorous human verification, ensuring high consistency with human evaluation.
arXiv Detail & Related papers (2023-11-30T18:02:44Z) - When does In-context Learning Fall Short and Why? A Study on
Specification-Heavy Tasks [54.71034943526973]
In-context learning (ICL) has become the default method for using large language models (LLMs)
We find that ICL falls short of handling specification-heavy tasks, which are tasks with complicated and extensive task specifications.
We identify three primary reasons: inability to specifically understand context, misalignment in task schema comprehension with humans, and inadequate long-text understanding ability.
arXiv Detail & Related papers (2023-11-15T14:26:30Z) - Assessing Logical Puzzle Solving in Large Language Models: Insights from a Minesweeper Case Study [10.95835611110119]
We introduce a novel task -- Minesweeper -- designed in a format unfamiliar to Large Language Models (LLMs)
This task challenges LLMs to identify the locations of mines based on numerical clues provided by adjacent opened cells.
Our experiments, including trials with the advanced GPT-4 model, indicate that while LLMs possess the foundational abilities required for this task, they struggle to integrate these into a coherent, multi-step logical reasoning process needed to solve Minesweeper.
arXiv Detail & Related papers (2023-11-13T15:11:26Z) - TRACE: A Comprehensive Benchmark for Continual Learning in Large
Language Models [52.734140807634624]
Aligned large language models (LLMs) demonstrate exceptional capabilities in task-solving, following instructions, and ensuring safety.
Existing continual learning benchmarks lack sufficient challenge for leading aligned LLMs.
We introduce TRACE, a novel benchmark designed to evaluate continual learning in LLMs.
arXiv Detail & Related papers (2023-10-10T16:38:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.