Related papers: Tailoring Machine Learning for Process Mining

Tailoring Machine Learning for Process Mining

URL: http://arxiv.org/abs/2306.10341v1
Date: Sat, 17 Jun 2023 12:59:51 GMT
Title: Tailoring Machine Learning for Process Mining
Authors: Paolo Ceravolo and Sylvio Barbon Junior and Ernesto Damiani and Wil van der Aalst
Abstract summary: We argue that a deeper insight into the issues raised by training machine learning models with process data is crucial to ground a sound integration of process mining and machine learning. Our analysis of such issues is aimed at laying the foundation for a methodology aimed at correctly aligning machine learning with process mining requirements.
Score: 5.237999056930947
License: http://creativecommons.org/licenses/by-sa/4.0/
Abstract: Machine learning models are routinely integrated into process mining pipelines to carry out tasks like data transformation, noise reduction, anomaly detection, classification, and prediction. Often, the design of such models is based on some ad-hoc assumptions about the corresponding data distributions, which are not necessarily in accordance with the non-parametric distributions typically observed with process data. Moreover, the learning procedure they follow ignores the constraints concurrency imposes to process data. Data encoding is a key element to smooth the mismatch between these assumptions but its potential is poorly exploited. In this paper, we argue that a deeper insight into the issues raised by training machine learning models with process data is crucial to ground a sound integration of process mining and machine learning. Our analysis of such issues is aimed at laying the foundation for a methodology aimed at correctly aligning machine learning with process mining requirements and stimulating the research to elaborate in this direction.

Related papers

Efficient Machine Unlearning via Influence Approximation [75.31015485113993]
Influence-based unlearning has emerged as a prominent approach to estimate the impact of individual training samples on model parameters without retraining.<n>This paper establishes a theoretical link between memorizing (incremental learning) and forgetting (unlearning)<n>We introduce the Influence Approximation Unlearning algorithm for efficient machine unlearning from the incremental perspective.
arXiv Detail & Related papers (2025-07-31T05:34:27Z)
Mitigating Attrition: Data-Driven Approach Using Machine Learning and Data Engineering [0.0]
This paper presents a novel data-driven approach to mitigating employee attrition using machine learning and data engineering techniques. The proposed framework integrates data from various human resources systems and leverages advanced feature engineering to capture a comprehensive set of factors influencing attrition.
arXiv Detail & Related papers (2025-02-25T05:29:45Z)
Unlearning Information Bottleneck: Machine Unlearning of Systematic Patterns and Biases [6.936871609178494]
We present Unlearning Information Bottleneck (UIB), a novel information-theoretic framework designed to enhance the process of machine unlearning. By proposing a variational upper bound, we recalibrate the model parameters through a dynamic prior that integrates changes in data distribution with an affordable computational cost. Our experiments across various datasets, models, and unlearning methods demonstrate that our approach effectively removes systematic patterns and biases while maintaining the performance of models post-unlearning.
arXiv Detail & Related papers (2024-05-22T21:54:05Z)
Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective. This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z)
Robust Machine Learning by Transforming and Augmenting Imperfect Training Data [6.928276018602774]
This thesis explores several data sensitivities of modern machine learning. We first discuss how to prevent ML from codifying prior human discrimination measured in the training data. We then discuss the problem of learning from data containing spurious features, which provide predictive fidelity during training but are unreliable upon deployment.
arXiv Detail & Related papers (2023-12-19T20:49:28Z)
Deep Learning based pipeline for anomaly detection and quality enhancement in industrial binder jetting processes [68.8204255655161]
Anomaly detection describes methods of finding abnormal states, instances or data points that differ from a normal value space. This paper contributes to a data-centric way of approaching artificial intelligence in industrial production.
arXiv Detail & Related papers (2022-09-21T08:14:34Z)
Capturing and incorporating expert knowledge into machine learning models for quality prediction in manufacturing [0.0]
This study introduces a general methodology for building quality prediction models with machine learning methods on small datasets. The proposed methodology produces prediction models that strictly comply with all the expert knowledge specified by the involved process specialists.
arXiv Detail & Related papers (2022-02-04T07:22:29Z)
MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data. MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism. We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z)
A Meta-learning Approach to Reservoir Computing: Time Series Prediction with Limited Data [0.0]
We present a data-driven approach to automatically extract an appropriate model structure from experimentally observed processes. We demonstrate our approach on a simple benchmark problem, where it beats the state of the art meta-learning techniques.
arXiv Detail & Related papers (2021-10-07T18:23:14Z)
Using Data Assimilation to Train a Hybrid Forecast System that Combines Machine-Learning and Knowledge-Based Components [52.77024349608834]
We consider the problem of data-assisted forecasting of chaotic dynamical systems when the available data is noisy partial measurements. We show that by using partial measurements of the state of the dynamical system, we can train a machine learning model to improve predictions made by an imperfect knowledge-based model.
arXiv Detail & Related papers (2021-02-15T19:56:48Z)
Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines. Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z)
How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance. We formulate a quality measure for the data set, which we refer to as $rho$-gap. We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z)
Machine Learning to Tackle the Challenges of Transient and Soft Errors in Complex Circuits [0.16311150636417257]
Machine learning models are used to predict accurate per-instance Functional De-Rating data for the full list of circuit instances. The presented methodology is applied on a practical example and various machine learning models are evaluated and compared.
arXiv Detail & Related papers (2020-02-18T18:38:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.