Tailoring Machine Learning for Process Mining
- URL: http://arxiv.org/abs/2306.10341v1
- Date: Sat, 17 Jun 2023 12:59:51 GMT
- Title: Tailoring Machine Learning for Process Mining
- Authors: Paolo Ceravolo and Sylvio Barbon Junior and Ernesto Damiani and Wil
van der Aalst
- Abstract summary: We argue that a deeper insight into the issues raised by training machine learning models with process data is crucial to ground a sound integration of process mining and machine learning.
Our analysis of such issues is aimed at laying the foundation for a methodology aimed at correctly aligning machine learning with process mining requirements.
- Score: 5.237999056930947
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Machine learning models are routinely integrated into process mining
pipelines to carry out tasks like data transformation, noise reduction, anomaly
detection, classification, and prediction. Often, the design of such models is
based on some ad-hoc assumptions about the corresponding data distributions,
which are not necessarily in accordance with the non-parametric distributions
typically observed with process data. Moreover, the learning procedure they
follow ignores the constraints concurrency imposes to process data. Data
encoding is a key element to smooth the mismatch between these assumptions but
its potential is poorly exploited. In this paper, we argue that a deeper
insight into the issues raised by training machine learning models with process
data is crucial to ground a sound integration of process mining and machine
learning. Our analysis of such issues is aimed at laying the foundation for a
methodology aimed at correctly aligning machine learning with process mining
requirements and stimulating the research to elaborate in this direction.
Related papers
- Unlearning Information Bottleneck: Machine Unlearning of Systematic Patterns and Biases [6.936871609178494]
We present Unlearning Information Bottleneck (UIB), a novel information-theoretic framework designed to enhance the process of machine unlearning.
By proposing a variational upper bound, we recalibrate the model parameters through a dynamic prior that integrates changes in data distribution with an affordable computational cost.
Our experiments across various datasets, models, and unlearning methods demonstrate that our approach effectively removes systematic patterns and biases while maintaining the performance of models post-unlearning.
arXiv Detail & Related papers (2024-05-22T21:54:05Z) - Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution [62.71425232332837]
We show that training amortized models with noisy labels is inexpensive and surprisingly effective.
This approach significantly accelerates several feature attribution and data valuation methods, often yielding an order of magnitude speedup over existing approaches.
arXiv Detail & Related papers (2024-01-29T03:42:37Z) - Robust Machine Learning by Transforming and Augmenting Imperfect
Training Data [6.928276018602774]
This thesis explores several data sensitivities of modern machine learning.
We first discuss how to prevent ML from codifying prior human discrimination measured in the training data.
We then discuss the problem of learning from data containing spurious features, which provide predictive fidelity during training but are unreliable upon deployment.
arXiv Detail & Related papers (2023-12-19T20:49:28Z) - Deep Learning based pipeline for anomaly detection and quality
enhancement in industrial binder jetting processes [68.8204255655161]
Anomaly detection describes methods of finding abnormal states, instances or data points that differ from a normal value space.
This paper contributes to a data-centric way of approaching artificial intelligence in industrial production.
arXiv Detail & Related papers (2022-09-21T08:14:34Z) - Capturing and incorporating expert knowledge into machine learning
models for quality prediction in manufacturing [0.0]
This study introduces a general methodology for building quality prediction models with machine learning methods on small datasets.
The proposed methodology produces prediction models that strictly comply with all the expert knowledge specified by the involved process specialists.
arXiv Detail & Related papers (2022-02-04T07:22:29Z) - MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms [82.90843777097606]
We propose a causally-aware imputation algorithm (MIRACLE) for missing data.
MIRACLE iteratively refines the imputation of a baseline by simultaneously modeling the missingness generating mechanism.
We conduct extensive experiments on synthetic and a variety of publicly available datasets to show that MIRACLE is able to consistently improve imputation.
arXiv Detail & Related papers (2021-11-04T22:38:18Z) - A Meta-learning Approach to Reservoir Computing: Time Series Prediction
with Limited Data [0.0]
We present a data-driven approach to automatically extract an appropriate model structure from experimentally observed processes.
We demonstrate our approach on a simple benchmark problem, where it beats the state of the art meta-learning techniques.
arXiv Detail & Related papers (2021-10-07T18:23:14Z) - Using Data Assimilation to Train a Hybrid Forecast System that Combines
Machine-Learning and Knowledge-Based Components [52.77024349608834]
We consider the problem of data-assisted forecasting of chaotic dynamical systems when the available data is noisy partial measurements.
We show that by using partial measurements of the state of the dynamical system, we can train a machine learning model to improve predictions made by an imperfect knowledge-based model.
arXiv Detail & Related papers (2021-02-15T19:56:48Z) - Graph Embedding with Data Uncertainty [113.39838145450007]
spectral-based subspace learning is a common data preprocessing step in many machine learning pipelines.
Most subspace learning methods do not take into consideration possible measurement inaccuracies or artifacts that can lead to data with high uncertainty.
arXiv Detail & Related papers (2020-09-01T15:08:23Z) - How Training Data Impacts Performance in Learning-based Control [67.7875109298865]
This paper derives an analytical relationship between the density of the training data and the control performance.
We formulate a quality measure for the data set, which we refer to as $rho$-gap.
We show how the $rho$-gap can be applied to a feedback linearizing control law.
arXiv Detail & Related papers (2020-05-25T12:13:49Z) - Machine Learning to Tackle the Challenges of Transient and Soft Errors
in Complex Circuits [0.16311150636417257]
Machine learning models are used to predict accurate per-instance Functional De-Rating data for the full list of circuit instances.
The presented methodology is applied on a practical example and various machine learning models are evaluated and compared.
arXiv Detail & Related papers (2020-02-18T18:38:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.