Object Type Clustering using Markov Directly-Follow Multigraph in
Object-Centric Process Mining
- URL: http://arxiv.org/abs/2206.11017v1
- Date: Wed, 22 Jun 2022 12:36:46 GMT
- Title: Object Type Clustering using Markov Directly-Follow Multigraph in
Object-Centric Process Mining
- Authors: Amin Jalali
- Abstract summary: This paper introduces a new approach to cluster similar case notions based on Markov Directly-Follow Multigraph.
A threshold tuning algorithm is also defined to identify sets of different clusters that can be discovered based on different levels of similarity.
- Score: 2.3351527694849574
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Object-centric process mining is a new paradigm with more realistic
assumptions about underlying data by considering several case notions, e.g., an
order handling process can be analyzed based on order, item, package, and route
case notions. Including many case notions can result in a very complex model.
To cope with such complexity, this paper introduces a new approach to cluster
similar case notions based on Markov Directly-Follow Multigraph, which is an
extended version of the well-known Directly-Follow Graph supported by many
industrial and academic process mining tools. This graph is used to calculate a
similarity matrix for discovering clusters of similar case notions based on a
threshold. A threshold tuning algorithm is also defined to identify sets of
different clusters that can be discovered based on different levels of
similarity. Thus, the cluster discovery will not rely on merely analysts'
assumptions. The approach is implemented and released as a part of a python
library, called processmining, and it is evaluated through a Purchase to Pay
(P2P) object-centric event log file. Some discovered clusters are evaluated by
discovering Directly Follow-Multigraph by flattening the log based on the
clusters. The similarity between identified clusters is also evaluated by
calculating the similarity between the behavior of the process models
discovered for each case notion using inductive miner based on footprints
conformance checking.
Related papers
- Measuring similarity between embedding spaces using induced neighborhood graphs [10.056989400384772]
We propose a metric to evaluate the similarity between paired item representations.
Our results show that accuracy in both analogy and zero-shot classification tasks correlates with the embedding similarity.
arXiv Detail & Related papers (2024-11-13T15:22:33Z) - Cluster-Aware Similarity Diffusion for Instance Retrieval [64.40171728912702]
Diffusion-based re-ranking is a common method used for retrieving instances by performing similarity propagation in a nearest neighbor graph.
We propose a novel Cluster-Aware Similarity (CAS) diffusion for instance retrieval.
arXiv Detail & Related papers (2024-06-04T14:19:50Z) - How to Evaluate Entity Resolution Systems: An Entity-Centric Framework with Application to Inventor Name Disambiguation [1.7812428873698403]
We propose an entity-centric data labeling methodology that integrates with a unified framework for monitoring summary statistics.
These benchmark data sets can then be used for model training and a variety of evaluation tasks.
arXiv Detail & Related papers (2024-04-08T15:53:29Z) - Self Supervised Correlation-based Permutations for Multi-View Clustering [7.972599673048582]
We propose an end-to-end deep learning-based MVC framework for general data.
Our approach involves learning meaningful fused data representations with a novel permutation-based canonical correlation objective.
We demonstrate the effectiveness of our model using ten MVC benchmark datasets.
arXiv Detail & Related papers (2024-02-26T08:08:30Z) - Rethinking Clustering-Based Pseudo-Labeling for Unsupervised
Meta-Learning [146.11600461034746]
Method for unsupervised meta-learning, CACTUs, is a clustering-based approach with pseudo-labeling.
This approach is model-agnostic and can be combined with supervised algorithms to learn from unlabeled data.
We prove that the core reason for this is lack of a clustering-friendly property in the embedding space.
arXiv Detail & Related papers (2022-09-27T19:04:36Z) - SSDBCODI: Semi-Supervised Density-Based Clustering with Outliers
Detection Integrated [1.8444322599555096]
Clustering analysis is one of the critical tasks in machine learning.
Due to the fact that the performance of clustering clustering can be significantly eroded by outliers, algorithms try to incorporate the process of outlier detection.
We have proposed SSDBCODI, a semi-supervised detection element.
arXiv Detail & Related papers (2022-08-10T21:06:38Z) - Skew-Symmetric Adjacency Matrices for Clustering Directed Graphs [5.301300942803395]
Cut-based directed graph (digraph) clustering often focuses on finding dense within-cluster or sparse between-cluster connections.
For flow-based clusterings the edges between clusters tend to be oriented in one direction and have been found in migration data, food webs, and trade data.
arXiv Detail & Related papers (2022-03-02T20:07:04Z) - Finding Geometric Models by Clustering in the Consensus Space [61.65661010039768]
We propose a new algorithm for finding an unknown number of geometric models, e.g., homographies.
We present a number of applications where the use of multiple geometric models improves accuracy.
These include pose estimation from multiple generalized homographies; trajectory estimation of fast-moving objects.
arXiv Detail & Related papers (2021-03-25T14:35:07Z) - Kernel learning approaches for summarising and combining posterior
similarity matrices [68.8204255655161]
We build upon the notion of the posterior similarity matrix (PSM) in order to suggest new approaches for summarising the output of MCMC algorithms for Bayesian clustering models.
A key contribution of our work is the observation that PSMs are positive semi-definite, and hence can be used to define probabilistically-motivated kernel matrices.
arXiv Detail & Related papers (2020-09-27T14:16:14Z) - Structured Graph Learning for Clustering and Semi-supervised
Classification [74.35376212789132]
We propose a graph learning framework to preserve both the local and global structure of data.
Our method uses the self-expressiveness of samples to capture the global structure and adaptive neighbor approach to respect the local structure.
Our model is equivalent to a combination of kernel k-means and k-means methods under certain condition.
arXiv Detail & Related papers (2020-08-31T08:41:20Z) - Conjoined Dirichlet Process [63.89763375457853]
We develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns.
We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.
arXiv Detail & Related papers (2020-02-08T19:41:23Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.