Generalization in Automated Process Discovery: A Framework based on
Event Log Patterns
- URL: http://arxiv.org/abs/2203.14079v1
- Date: Sat, 26 Mar 2022 13:49:11 GMT
- Title: Generalization in Automated Process Discovery: A Framework based on
Event Log Patterns
- Authors: Daniel Rei{\ss}ner, Abel Armas-Cervantes, Marcello La Rosa
- Abstract summary: Existing generalization measures exhibit several shortcomings that severely hinder their applicability in practice.
We propose a framework that generalizes a set of patterns discovered from an event log with representative traces.
We show that our measure can be efficiently computed for datasets two orders of magnitude larger than the largest dataset the baseline generalization measures can handle.
- Score: 0.03222802562733786
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The importance of quality measures in process mining has increased. One of
the key quality aspects, generalization, is concerned with measuring the degree
of overfitting of a process model w.r.t. an event log, since the recorded
behavior is just an example of the true behavior of the underlying business
process. Existing generalization measures exhibit several shortcomings that
severely hinder their applicability in practice. For example, they assume the
event log fully fits the discovered process model, and cannot deal with large
real-life event logs and complex process models. More significantly, current
measures neglect generalizations for clear patterns that demand a certain
construct in the model. For example, a repeating sequence in an event log
should be generalized with a loop structure in the model. We address these
shortcomings by proposing a framework of measures that generalize a set of
patterns discovered from an event log with representative traces and check the
corresponding control-flow structures in the process model via their trace
alignment. We instantiate the framework with a generalization measure that uses
tandem repeats to identify repetitive patterns that are compared to the loop
structures and a concurrency oracle to identify concurrent patterns that are
compared to the parallel structures of the process model. In an extensive
qualitative and quantitative evaluation using 74 log-model pairs using against
two baseline generalization measures, we show that the proposed generalization
measure consistently ranks process models that fulfil the observed patterns
with generalizing control-flow structures higher than those which do not, while
the baseline measures disregard those patterns. Further, we show that our
measure can be efficiently computed for datasets two orders of magnitude larger
than the largest dataset the baseline generalization measures can handle.
Related papers
- Approximate learning of parsimonious Bayesian context trees [0.0]
The proposed framework is tested on synthetic and real-world data examples.
It outperforms existing sequence models when fitted to real protein sequences and honeypot computer terminal sessions.
arXiv Detail & Related papers (2024-07-27T11:50:40Z) - Mining Constraints from Reference Process Models for Detecting Best-Practice Violations in Event Log [1.389948527681755]
We propose a framework for mining declarative best-practice constraints from a reference model collection.
We demonstrate the capability of our framework to detect best-practice violations through an evaluation based on real-world process model collections and event logs.
arXiv Detail & Related papers (2024-07-02T15:05:37Z) - Mining a Minimal Set of Behavioral Patterns using Incremental Evaluation [3.16536213610547]
Existing approaches to behavioral pattern mining suffer from two limitations.
First, they show limited scalability as incremental computation is incorporated only in the generation of pattern candidates.
Second, process analysis based on mined patterns shows limited effectiveness due to an overwhelmingly large number of patterns obtained in practical application scenarios.
arXiv Detail & Related papers (2024-02-05T11:41:37Z) - RGM: A Robust Generalizable Matching Model [49.60975442871967]
We propose a deep model for sparse and dense matching, termed RGM (Robust Generalist Matching)
To narrow the gap between synthetic training samples and real-world scenarios, we build a new, large-scale dataset with sparse correspondence ground truth.
We are able to mix up various dense and sparse matching datasets, significantly improving the training diversity.
arXiv Detail & Related papers (2023-10-18T07:30:08Z) - Variable Importance Matching for Causal Inference [73.25504313552516]
We describe a general framework called Model-to-Match that achieves these goals.
Model-to-Match uses variable importance measurements to construct a distance metric.
We operationalize the Model-to-Match framework with LASSO.
arXiv Detail & Related papers (2023-02-23T00:43:03Z) - Generalization Properties of Retrieval-based Models [50.35325326050263]
Retrieval-based machine learning methods have enjoyed success on a wide range of problems.
Despite growing literature showcasing the promise of these models, the theoretical underpinning for such models remains underexplored.
We present a formal treatment of retrieval-based models to characterize their generalization ability.
arXiv Detail & Related papers (2022-10-06T00:33:01Z) - Relational Action Bases: Formalization, Effective Safety Verification,
and Invariants (Extended Version) [67.99023219822564]
We introduce the general framework of relational action bases (RABs)
RABs generalize existing models by lifting both restrictions.
We demonstrate the effectiveness of this approach on a benchmark of data-aware business processes.
arXiv Detail & Related papers (2022-08-12T17:03:50Z) - Complex Event Forecasting with Prediction Suffix Trees: Extended
Technical Report [70.7321040534471]
Complex Event Recognition (CER) systems have become popular in the past two decades due to their ability to "instantly" detect patterns on real-time streams of events.
There is a lack of methods for forecasting when a pattern might occur before such an occurrence is actually detected by a CER engine.
We present a formal framework that attempts to address the issue of Complex Event Forecasting.
arXiv Detail & Related papers (2021-09-01T09:52:31Z) - Bootstrapping Generalization of Process Models Discovered From Event
Data [10.574698833115589]
Generalization seeks to quantify how well a discovered model describes future executions of the system.
We employ a bootstrap approach to estimate properties of a population based on a sample.
Experiments demonstrate the feasibility of the approach in industrial settings.
arXiv Detail & Related papers (2021-07-08T14:35:56Z) - CoCoMoT: Conformance Checking of Multi-Perspective Processes via SMT
(Extended Version) [62.96267257163426]
We introduce the CoCoMoT (Computing Conformance Modulo Theories) framework.
First, we show how SAT-based encodings studied in the pure control-flow setting can be lifted to our data-aware case.
Second, we introduce a novel preprocessing technique based on a notion of property-preserving clustering.
arXiv Detail & Related papers (2021-03-18T20:22:50Z) - An Entropic Relevance Measure for Stochastic Conformance Checking in
Process Mining [9.302180124254338]
We present an entropic relevance measure for conformance checking, computed as the average number of bits required to compress each of the log's traces.
We show that entropic relevance is computable in time linear in the size of the log, and provide evaluation outcomes that demonstrate the feasibility of using the new approach in industrial settings.
arXiv Detail & Related papers (2020-07-18T02:25:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.