Efficient Online Computation of Business Process State From Trace Prefixes via N-Gram Indexing
- URL: http://arxiv.org/abs/2409.05658v1
- Date: Mon, 9 Sep 2024 14:28:24 GMT
- Title: Efficient Online Computation of Business Process State From Trace Prefixes via N-Gram Indexing
- Authors: David Chapela-Campa, Marlon Dumas,
- Abstract summary: Given a process model and an event log containing trace prefixes of ongoing cases of a process, map each case to its corresponding state in the model.
An approach to this state computation problem is to perform a token-based replay of each trace prefix against the model.
This paper proposes a method that, given a trace prefix of an ongoing case, computes its state in constant time using an index that represents states as n-grams.
- Score: 0.5141137421503899
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: This paper addresses the following problem: Given a process model and an event log containing trace prefixes of ongoing cases of a process, map each case to its corresponding state (i.e., marking) in the model. This state computation operation is a building block of other process mining operations, such as log animation and short-term simulation. An approach to this state computation problem is to perform a token-based replay of each trace prefix against the model. However, when a trace prefix does not strictly follow the behavior of the process model, token replay may produce a state that is not reachable from the initial state of the process. An alternative approach is to first compute an alignment between the trace prefix of each ongoing case and the model, and then replay the aligned trace prefix. However, (prefix-)alignment is computationally expensive. This paper proposes a method that, given a trace prefix of an ongoing case, computes its state in constant time using an index that represents states as n-grams. An empirical evaluation shows that the proposed approach has an accuracy comparable to that of the prefix-alignment approach, while achieving a throughput of hundreds of thousands of traces per second.
Related papers
- Bisimulation Learning [55.859538562698496]
We compute finite bisimulations of state transition systems with large, possibly infinite state space.
Our technique yields faster verification results than alternative state-of-the-art tools in practice.
arXiv Detail & Related papers (2024-05-24T17:11:27Z) - How Good is the Model in Model-in-the-loop Event Coreference Resolution
Annotation? [3.712417884848568]
We propose a model-in-the-loop annotation approach for event coreference resolution, where a machine learning model suggests likely corefering event pairs only.
We evaluate the effectiveness of this approach by first simulating the annotation process and then, using a novel annotator-centric Recall- effort trade-off metric, we compare the results of various underlying models and datasets.
arXiv Detail & Related papers (2023-06-06T18:06:24Z) - OpenPI-C: A Better Benchmark and Stronger Baseline for Open-Vocabulary
State Tracking [55.62705574507595]
OpenPI is the only dataset annotated for open-vocabulary state tracking.
We categorize 3 types of problems on the procedure level, step level and state change level respectively.
For the evaluation metric, we propose a cluster-based metric to fix the original metric's preference for repetition.
arXiv Detail & Related papers (2023-06-01T16:48:20Z) - Value-Consistent Representation Learning for Data-Efficient
Reinforcement Learning [105.70602423944148]
We propose a novel method, called value-consistent representation learning (VCR), to learn representations that are directly related to decision-making.
Instead of aligning this imagined state with a real state returned by the environment, VCR applies a $Q$-value head on both states and obtains two distributions of action values.
It has been demonstrated that our methods achieve new state-of-the-art performance for search-free RL algorithms.
arXiv Detail & Related papers (2022-06-25T03:02:25Z) - Conformance Checking Over Stochastically Known Logs [7.882975068446842]
Data logs may become uncertain due to, e.g., sensor reading inaccuracies or incorrect interpretation of readings by processing programs.
In this work we focus on conformance checking, which compares a process model with an event log.
We mathematically define a trace model, a synchronous product, and a cost function that reflects the uncertainty of events in a log.
arXiv Detail & Related papers (2022-03-14T21:33:06Z) - CoCoMoT: Conformance Checking of Multi-Perspective Processes via SMT
(Extended Version) [62.96267257163426]
We introduce the CoCoMoT (Computing Conformance Modulo Theories) framework.
First, we show how SAT-based encodings studied in the pure control-flow setting can be lifted to our data-aware case.
Second, we introduce a novel preprocessing technique based on a notion of property-preserving clustering.
arXiv Detail & Related papers (2021-03-18T20:22:50Z) - Predictive Process Model Monitoring using Recurrent Neural Networks [2.4029798593292706]
This paper introduces Processes-As-Movies (PAM), a technique that provides a middle ground between predictive monitoring.
It does so by capturing declarative process constraints between activities in various windows of a process execution trace.
Various recurrent neural network topologies tailored to high-dimensional input are used to model the process model evolution with windows as time steps.
arXiv Detail & Related papers (2020-11-05T13:57:33Z) - Process Discovery for Structured Program Synthesis [70.29027202357385]
A core task in process mining is process discovery which aims to learn an accurate process model from event log data.
In this paper, we propose to use (block-) structured programs directly as target process models.
We develop a novel bottom-up agglomerative approach to the discovery of such structured program process models.
arXiv Detail & Related papers (2020-08-13T10:33:10Z) - An Entropic Relevance Measure for Stochastic Conformance Checking in
Process Mining [9.302180124254338]
We present an entropic relevance measure for conformance checking, computed as the average number of bits required to compress each of the log's traces.
We show that entropic relevance is computable in time linear in the size of the log, and provide evaluation outcomes that demonstrate the feasibility of using the new approach in industrial settings.
arXiv Detail & Related papers (2020-07-18T02:25:33Z) - Efficient Conformance Checking using Approximate Alignment Computation
with Tandem Repeats [0.03222802562733786]
Conformance checking aims to find and describe the differences between a process model capturing the expected process behavior and a corresponding event log recording the observed behavior.
Alignments are an established technique to compute the distance between a trace in the event log and the closest execution trace of a corresponding process model.
We propose a novel approximate technique that uses pre- and post-processing steps to compress the length of a trace and recompute the alignment cost.
arXiv Detail & Related papers (2020-04-02T03:50:32Z) - Self-Supervised Log Parsing [59.04636530383049]
Large-scale software systems generate massive volumes of semi-structured log records.
Existing approaches rely on log-specifics or manual rule extraction.
We propose NuLog that utilizes a self-supervised learning model and formulates the parsing task as masked language modeling.
arXiv Detail & Related papers (2020-03-17T19:25:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.