Object Type Clustering using Markov Directly-Follow Multigraph in
Object-Centric Process Mining
- URL: http://arxiv.org/abs/2206.11017v1
- Date: Wed, 22 Jun 2022 12:36:46 GMT
- Title: Object Type Clustering using Markov Directly-Follow Multigraph in
Object-Centric Process Mining
- Authors: Amin Jalali
- Abstract summary: This paper introduces a new approach to cluster similar case notions based on Markov Directly-Follow Multigraph.
A threshold tuning algorithm is also defined to identify sets of different clusters that can be discovered based on different levels of similarity.
- Score: 2.3351527694849574
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Object-centric process mining is a new paradigm with more realistic
assumptions about underlying data by considering several case notions, e.g., an
order handling process can be analyzed based on order, item, package, and route
case notions. Including many case notions can result in a very complex model.
To cope with such complexity, this paper introduces a new approach to cluster
similar case notions based on Markov Directly-Follow Multigraph, which is an
extended version of the well-known Directly-Follow Graph supported by many
industrial and academic process mining tools. This graph is used to calculate a
similarity matrix for discovering clusters of similar case notions based on a
threshold. A threshold tuning algorithm is also defined to identify sets of
different clusters that can be discovered based on different levels of
similarity. Thus, the cluster discovery will not rely on merely analysts'
assumptions. The approach is implemented and released as a part of a python
library, called processmining, and it is evaluated through a Purchase to Pay
(P2P) object-centric event log file. Some discovered clusters are evaluated by
discovering Directly Follow-Multigraph by flattening the log based on the
clusters. The similarity between identified clusters is also evaluated by
calculating the similarity between the behavior of the process models
discovered for each case notion using inductive miner based on footprints
conformance checking.
Related papers
- Cluster-Aware Similarity Diffusion for Instance Retrieval [64.40171728912702]
Diffusion-based re-ranking is a common method used for retrieving instances by performing similarity propagation in a nearest neighbor graph.
We propose a novel Cluster-Aware Similarity (CAS) diffusion for instance retrieval.
arXiv Detail & Related papers (2024-06-04T14:19:50Z) - How to Evaluate Entity Resolution Systems: An Entity-Centric Framework with Application to Inventor Name Disambiguation [1.7812428873698403]
We propose an entity-centric data labeling methodology that integrates with a unified framework for monitoring summary statistics.
These benchmark data sets can then be used for model training and a variety of evaluation tasks.
arXiv Detail & Related papers (2024-04-08T15:53:29Z) - Self Supervised Correlation-based Permutations for Multi-View Clustering [7.972599673048582]
We propose an end-to-end deep learning-based MVC framework for general data.
Our approach involves learning meaningful fused data representations with a novel permutation-based canonical correlation objective.
We demonstrate the effectiveness of our model using ten MVC benchmark datasets.
arXiv Detail & Related papers (2024-02-26T08:08:30Z) - Instance-Optimal Cluster Recovery in the Labeled Stochastic Block Model [79.46465138631592]
We devise an efficient algorithm that recovers clusters using the observed labels.
We present Instance-Adaptive Clustering (IAC), the first algorithm whose performance matches these lower bounds both in expectation and with high probability.
arXiv Detail & Related papers (2023-06-18T08:46:06Z) - Efficient Failure Pattern Identification of Predictive Algorithms [15.02620042972929]
We propose a human-machine collaborative framework that consists of a team of human annotators and a sequential recommendation algorithm.
The results empirically demonstrate the competitive performance of our framework on multiple datasets at various signal-to-noise ratios.
arXiv Detail & Related papers (2023-06-01T14:54:42Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - SSDBCODI: Semi-Supervised Density-Based Clustering with Outliers
Detection Integrated [1.8444322599555096]
Clustering analysis is one of the critical tasks in machine learning.
Due to the fact that the performance of clustering clustering can be significantly eroded by outliers, algorithms try to incorporate the process of outlier detection.
We have proposed SSDBCODI, a semi-supervised detection element.
arXiv Detail & Related papers (2022-08-10T21:06:38Z) - Skew-Symmetric Adjacency Matrices for Clustering Directed Graphs [5.301300942803395]
Cut-based directed graph (digraph) clustering often focuses on finding dense within-cluster or sparse between-cluster connections.
For flow-based clusterings the edges between clusters tend to be oriented in one direction and have been found in migration data, food webs, and trade data.
arXiv Detail & Related papers (2022-03-02T20:07:04Z) - Finding Geometric Models by Clustering in the Consensus Space [61.65661010039768]
We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies.
We present a number of applications where the use of multiple geometric models improves accuracy.
These include pose estimation from multiple generalized homographies; trajectory estimation of fast-moving objects.
arXiv Detail & Related papers (2021-03-25T14:35:07Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.